Synthetic genome

ABSTRACT

The current invention provides a synthetic prokaryotic genome comprising 5 or fewer occurrences of one or more sense codons; and/or a synthetic prokaryotic genome derived from a parent genome, wherein the synthetic prokaryotic genome comprises less than 10%, 5%, 2%, 1%, 0.5%, 0.1% of the occurrences of one or more sense codons, relative to the parent genome; and/or a synthetic prokaryotic genome comprising 100 or more, 200 or more, or 1000 or more genes with no occurrences of one or more sense codons.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has beensubmitted electronically in XML format and is hereby incorporated byreference in its entirety. Said XML copy, created on May 15, 2023, isnamed “51689-008003_Sequence_Listing.xml” and is 8,555,414 bytes insize.

FIELD OF THE INVENTION

The present invention relates to synthetic genomes and methods of theirproduction.

BACKGROUND TO THE INVENTION

The design and synthesis of genomes provides a powerful approach forunderstanding and engineering biology. Genome synthesis has thepotential to accelerate metabolic engineering. In particular, genomesynthesis has the potential to elucidate synonymous codon function andto facilitate genetically encoded unnatural polymer synthesis (Wang, K.,et al., 2016. Nature, 539(7627), 59-64).

The standard genetic code encodes the 20 canonical amino acids using 61sense codons, and eighteen of the twenty amino acids are encoded by morethan one synonymous codon. Nature chooses one sense codon, from up tosix synonyms, to encode each amino acid at each position in a gene.Synonymous codon choice can influence mRNA folding, transcriptional andtranslational regulatory sequences, translation rate, co-translationalfolding, protein levels, and has emerging and yet to be understood roles(Wang, K., et al., 2016. Nature, 539(7627), 59-64; and Cambray, G., etal., 2018. Nature biotechnology, 36(10), 1005-1015).

Genome-wide replacement of a target codon with synonymous codons(synonymous codon compression) may provide a foundation for reassigningsense codons to non-canonical amino acids (or other monomers) tofacilitate the in vivo biosynthesis of genetically encoded non-canonicalbiopolymers (Chin, J. W., 2017. Nature, 550(7674), 53-60).

Site-directed mutagenesis approaches have been used to replace up to 321amber stop codons in the E. coli genome (Mukai, T., et al., 2015.Scientific reports, 5, p. 9699). However, sense codons are commonlyorders of magnitude more abundant than stop codons, and genomesynthesis, rather than mutagenesis, may be the preferred route totackling sense codon removal in many cases.

Genome synthesis has enabled the creation of Mycoplasma with syntheticgenomes (Gibson, D. G., et al., 2010. Science, 329(5987), 52-56) and thecreation of nine strains of S. cerevisiae in which the DNA for one ortwo of the sixteen chromosomes is replaced by synthetic DNA (Zhang, W.,et al., 2017. Science, 355(6329), eaaf3981; and Richardson, S. M., etal., 2017. Science, 355(6329), 1040-1044). These experiments havereplaced up to 1 Mb of DNA (0.99 Mb, yeast; 1.08 Mb, Mycoplasma) inindividual strains. Replicon excision for enhanced genome engineeringthrough programmed recombination (REXER) has been reported forreplacing >100 kb of the E. coli genome with synthetic DNA in a singlestep. Moreover, it has been shown that REXER can be iterated via genomestepwise interchange synthesis (GENESIS) to replace 220 kb of the E.coli genome with 230 kb of synthetic DNA (Wang, K., et al., 2016.Nature, 539(7627), 59-64; WO 2018/020248).

Genome synthesis has been used to alter synonymous codons in individualgenes (Napolitano, M. G., et al., 2016. PNAS, 113(38), E5588-E5597),genomic regions and essential operons (Wang, K., et al., 2016. Nature,539(7627), 59-64; and Lau, Y. H., et al. 2017. Nucleic acids research,45(11), 6971-6980). For instance, Wang et al. used defined ‘recodingschemes’ to replace a 20 kb region of the E. coli genome rich in bothessential genes and target codons.

However, these studies have mutated only a small fraction (up to 4.7%)of targeted sense codons in the genome of a single strain. Consequently,it is not known whether the application of these methods to genome-widesynonymous codon compression will be able to produce viable genomes. Forinstance, it is not known whether the defined recoding schemes tested inWang et al. can be applied genome-wide to create an organism in which areduced number of sense codons are used to encode the 20 canonical aminoacids.

Thus, there is a demand for synthetic genomes, wherein one or more sensecodon has been removed. There is also a demand for improved methods toproduce synthetic genomes.

SUMMARY OF THE INVENTION

The inventors have surprisingly found that a viable syntheticprokaryotic genome may be produced, wherein one or more sense codon hasbeen removed. In particular, they produced a viable synthetic genome inwhich the number of codons used to encode cellular protein is reducedfrom 64 to 61, by genome-wide recoding of two sense codons and one stopcodon. They also produced an E. coli host cell comprising said syntheticgenome.

They inventors have also surprisingly found that defined recoding andrefactoring schemes can enable genome-wide synonymous codon compressionfor more than 99.9% of target codons. They found that alternativerecoding and refactoring at non-tolerated positions enabled genome-widesynonymous codon compression.

The inventors have also surprisingly found that recombination-mediatedgenetic engineering (e.g. REXER and/or GENESIS) may be combined withdirected conjugation to efficiently produce synthetic genomes. Inparticular, they found, for example, that at least about 4 Mb of DNA canbe efficiently replaced by said method and that said method allowsfailures in the design of synthetic DNA (non-tolerated positions) to beidentified at codon-level resolution.

Accordingly, in one aspect the present invention provides a syntheticprokaryotic genome comprising 5 or fewer occurrences of one or moresense codons. In some embodiments the synthetic prokaryotic genomecomprises 4 or fewer, 3 or fewer, 2 or fewer, 1 or fewer, or nooccurrences of one or more sense codons. In some embodiments the one ormore sense codons consist of one sense codon or two sense codons,preferably two sense codons. In some embodiments the syntheticprokaryotic genome comprises no occurrences of two or more sense codons,preferably two sense codons, and no occurrences of one stop codon,preferably the amber stop codon (TAG).

The synthetic prokaryotic genome may be a synthetic bacterial genome,preferably a synthetic Escherichia coli, Salmonella enterica, orShigella dysenteriae genome. In some embodiments the syntheticprokaryotic genome is 100 kb to 10 Mb, or 1 Mb to 10 Mb, or 2 Mb to 6 Mbin size. The synthetic prokaryotic genome may be viable. In someembodiments the synthetic prokaryotic genome comprises 100 or more, 200or more, or 1000 or more genes, optionally wherein the genes have nooccurrences of the one or more sense codons, preferably wherein thegenes are essential genes.

In some embodiments the one or more sense codons are selected from TCG,TCA, TCT, TCC, AGT, AGC, GCG, GCA, GCT, GCC, CTG, CTA, CTT, CTC, TTG,and TTA, preferably the one or more sense codons are selected from TCG,TCA, AGT, AGC, GCG, GCA, CTG, CTA, TTG, and TTA, more preferably the oneor more sense codons are selected from TCG, TCA, AGT, AGC, TTG, TTA, GCGand GCA, most preferably the one or more sense codons are TCG and/orTCA.

In some embodiments the synthetic prokaryotic genome comprises 10 orfewer, 5 or fewer, or no occurrences of the amber stop codon (TAG).

In a further aspect the present invention provides a syntheticprokaryotic genome comprising 100 or more, 200 or more, or 1000 or moregenes, wherein the genes collectively comprise 5 or fewer occurrences ofone or more sense codons, preferably wherein the genes are essentialgenes. In some embodiments the genes collectively comprise 4 or fewer, 3or fewer, 2 or fewer, 1 or fewer, or no occurrences of one or more sensecodons. In some embodiments the one or more sense codons consist of onesense codon or two sense codons, preferably two sense codons.

The synthetic prokaryotic genome may be a synthetic bacterial genome,preferably a synthetic Escherichia coli, Salmonella enterica, orShigella dysenteriae genome. In some embodiments the syntheticprokaryotic genome is 100 kb to 10 Mb, or 1 Mb to 10 Mb, or 2 Mb to 6 Mbin size. The synthetic prokaryotic genome may be viable.

In some embodiments the one or more sense codons are selected from TCG,TCA, TCT, TCC, AGT, AGC, GCG, GCA, GCT, GCC, CTG, CTA, CTT, CTC, TTG,and TTA, preferably the one or more sense codons are selected from TCG,TCA, AGT, AGC, GCG, GCA, CTG, CTA, TTG, and TTA, more preferably the oneor more sense codons are selected from TCG, TCA, AGT, AGC, TTG, TTA, GCGand GCA, most preferably the one or more sense codons are TCG and/orTCA.

In some embodiments the synthetic prokaryotic genome comprises 10 orfewer, 5 or fewer, or no occurrences of the amber stop codon (TAG).

In a further aspect the present invention provides a syntheticprokaryotic genome derived from a parent prokaryotic genome, wherein thesynthetic prokaryotic genome comprises less than 10%, 5%, 2%, 1%, 0.5%,0.1% of the occurrences of one or more sense codons, relative to theparent prokaryotic genome, or wherein the synthetic prokaryotic genomecomprises no occurrences of one or more sense codons. In someembodiments the one or more sense codons consist of one sense codon ortwo sense codons, preferably two sense codons.

The synthetic prokaryotic genome may be a bacterial genome, preferablyan Escherichia coli, Salmonella enterica, or Shigella dysenteriaegenome. In some embodiments the synthetic prokaryotic genome is 100 kbto 10 Mb, or 1 Mb to 10 Mb, or 2 Mb to 6 Mb in size. The syntheticprokaryotic genome may be viable.

In some embodiments the one or more sense codons are selected from TCG,TCA, TCT, TCC, AGT, AGC, GCG, GCA, GCT, GCC, CTG, CTA, CTT, CTC, TTG,and TTA, preferably the one or more sense codons are selected from TCG,TCA, AGT, AGC, GCG, GCA, CTG, CTA, TTG, and TTA, more preferably the oneor more sense codons are selected from TCG, TCA, AGT, AGC, TTG, TTA, GCGand GCA, most preferably the one or more sense codons are TCG and/orTCA, optionally wherein TCG and/or TCA are replaced with synonymoussense codons.

Preferably 90% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.6% or more, 99.7% or more, 99.8% or more, 99.9% or more, or100% of the occurrences of the one or more sense codons in the parentprokaryotic genome are replaced with synonymous sense codons. In someembodiments 90% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.6% or more, 99.7% or more, 99.8% or more, 99.9% or more, or100% of the occurrences of TCG and/or TCA in the parent prokaryoticgenome are replaced with AGC and/or AGT, most preferably 90% or more,95% or more, 98% or more, 99% or more, 99.5% or more, 99.6% or more,99.7% or more, 99.8% or more, 99.9% or more, or 100% of the occurrencesof TCG in the parent prokaryotic genome are replaced with AGC and/or90%, 95%, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.6% or more, 99.7% or more, 99.8% or more, 99.9% or more, or100% of the occurrences of TCA in the parent prokaryotic genome arereplaced with AGT.

In some embodiments the synthetic prokaryotic genome comprises 10 orfewer, 5 or fewer, or no occurrences of the amber stop codon (TAG),preferably wherein 90% or more, 95% or more, 98% or more, 99% or more,or all of the occurrences of TAG in the parent prokaryotic genome arereplaced with TAA.

In some embodiments 99.9% or more, or 100% of the occurrences of two ormore sense codons, preferably two sense codons, in the parentprokaryotic genome are replaced with synonymous sense codons, and all ofthe occurrences of TAG in the parent prokaryotic genome are replacedwith TAA.

One or more pairs of genes which share an overlapping region comprisingthe one or more sense codons in the parent prokaryotic genome may berefactored, preferably wherein the one or more pairs of genes are thosein which replacement of one or more of the sense codons with synonymoussense codons would change the encoded protein sequence of both or eitherof the pair of genes.

In some embodiments for pairs of genes in opposite orientations, asynthetic insert is inserted between the genes, wherein the syntheticinsert comprises the overlapping region; and/or for pairs of genes inthe same orientation, a synthetic insert is inserted between the genes,wherein the synthetic insert comprises: (i) a stop codon; (ii) about20-200 bp from upstream of the overlapping region; and (iii) theoverlapping region.

In a further aspect the present invention provides a polynucleotidecomprising twenty or more, thirty or more, forty or more, fifty or more,100 or more essential genes with no occurrences of one or more sensecodons. In some embodiments the one or more sense codons consist of onesense codon or two sense codons, preferably two sense codons.

In some embodiments the one or more sense codons are selected from TCG,TCA, TCT, TCC, AGT, AGC, GCG, GCA, GCT, GCC, CTG, CTA, CTT, CTC, TTG,and TTA, preferably the one or more sense codons are selected from TCG,TCA, AGT, AGC, GCG, GCA, CTG, CTA, TTG, and TTA, more preferably the oneor more sense codons are selected from TCG, TCA, AGT, AGC, TTG, TTA, GCGand GCA, most preferably the one or more sense codons are TCG and/orTCA.

The occurrences of the one or more sense codons in the genes may bereplaced with synonymous sense codons, preferably TCG codons arereplaced with AGC and/or TCA codons are replaced with AGT.

The essential genes may comprise essential genes selected from one ormore of the list consisting of: ribF, lspA, ispH, dapB, folA, imp, yabQ,ftsL, ftsI, murE, murF, mraY, murD, ftsW, murG, murC, ftsQ, ftsA, ftsZ,lpxC, secM, secA, can, folK, hemL, yadR, dapD, map, rpsB, tsf, pyrH,frr, dxr, ispU, cdsA, yaeL, yaeT, lpxD, fabZ, lpxA, lpxB, dnaE, accA,tilS, proS, yafF, hemB, secD, secF, ribD, ribE, thiL, dxs, ispA, dnaX,adk, hemH, lpxH, cysS, folD, entD, mrdB, mrdA, nadD, holA, rlpB, leuS,lnt, ginS, fldA, cydA, infA, cydC, ftsK, lolA, serS, rpsA, msbA, lpxK,kdsB, mukF, mukE, mukB, asnS, fabA, mviN, me, fabD, fabG, acpP, tmk,holB, lolC, lolD, lolE, purB, minE, minD, pth, prsA, ispE, lolB, hemA,prfA, prmC, kdsA, topA, ribA, fabI, tyrS, ribC, ydiL, pheT, pheS, rplT,infC, thrS, nadE, gapA, yeaZ, aspS, argS, pgsA, yefM, metG, folE, yejM,gyrA, nrdA, nrdB, folC, accD, fabB, gltX, ligA, zipA, dapE, dapA, der,hisS, ispG, suhB, tadA, acpS, era, rnc, lepB, rpoE, pssA, yfiO, rplS,trmD, rpsP, ffh, grpE, csrA, ispF, ispD, ftsB, eno, pyrG, chpR, lgt,fbaA, pgk, yqgD, metK, yqgF, plsC, ygiT, parE, ribB, cca, ygjD, tdcF,yraL, yhbV, infB, nusA, ftsH, obgE, rpmA, rplU, ispB, murA, yrbB, yrbK,yhbN, rpsI, rplM, degS, mreD, mreC, mreB, accB, accC, yrdC, def, fmt,rplQ, rpoA, rpsD, rpsK, rpsM, secY, rplO, rpmD, rpsE, rplR, rplF, rpsH,rpsN, rplE, rplX, rplN, rpsQ, rpmC, rplP, rpsC, rplV, rpsS, rplB, rplW,rplD, rplC, rpsJ, fusA, rpsG, rpsL, trpS, yrfF, asd, rpoH, ftsX, ftsE,ftsY, yhhQ, bcsB, glyQ, gpsA, rfaK, kdtA, coaD, rpmB, dfp, dut, gmk,spoT, gyrB, dnaN, dnaA, rpmH, rnpA, yidC, tnaB, glmS, glmU, wzyE, hemD,hemC, yigP, ubiB, ubiD, hemG, yihA, ftsN, murI, murB, birA, secE, nusG,rplJ, rplL, rpoB, rpoC, ubiA, plsB, lexA, dnaB, ssb, alsK, groS, psd,orn, yjeE, rpsR, chpS, ppa, valS, yjgP, yjgQ, and dnaC.

In a further aspect the present invention provides a prokaryotic hostcell comprising a synthetic prokaryotic genome according to the presentinvention or a polynucleotide according to the present invention.

The prokaryotic host cell may be viable. The prokaryotic host cell maybe a bacterial cell, preferably an Escherichia coli, Salmonellaenterica, or Shigella dysenteriae cell. Preferably the host cell issuitable for use in production of polypeptides comprising one or morenon-proteinogenic amino acids, preferably two or more non-proteinogenicamino acids, most preferably three or more non-proteinogenic aminoacids.

In a further aspect the present invention provides use of a prokaryotichost cell according to the present invention for producing polypeptidescomprising one or more non-proteinogenic amino acids, preferably two ormore non-proteinogenic amino acids, most preferably three or morenon-proteinogenic amino acids.

In a further aspect the present invention provides a method forproducing a synthetic genome comprising:

-   -   (a) providing a parent genome;    -   (b) carrying out one or more rounds of recombination-mediated        genetic engineering on the parent genome, to produce two or more        different partially synthetic genomes; and    -   (c) carrying out one or more rounds of directed conjugation with        the two or more different partially synthetic genomes to produce        a synthetic genome;        wherein the partially synthetic genomes each comprise a        synthetic region that has 50 or fewer, 20 or fewer, 10 or fewer,        5 or fewer, or 0 occurrences of each of one or more sense        codons; or wherein the partially synthetic genomes each comprise        a synthetic region that has less than 10%, 5%, 2%, 1%, 0.5%,        0.1% of the occurrences of each of one or more sense codons,        relative to the corresponding region in the parent genome.

The synthetic regions may collectively cover 90% or greater, 95% orgreater, 99% or greater or 100% of the parent genome. In someembodiments the synthetic regions are 10-1000 kb, 50-1000 kb, 100-1000kb, or 100-500 kb in size.

The method may further comprise testing the viability of the partiallysynthetic genomes after each round of recombination-mediated geneticengineering and/or after each round of directed conjugation.

The two or more different partially synthetic genomes may comprise atleast one partially synthetic donor genome and at least one partiallysynthetic recipient genome. In some embodiments the at least onepartially synthetic donor genome comprises a synthetic region and afirst selectable marker flanked by two homology regions immediatelydownstream of an origin of transfer; and the at least one partiallysynthetic recipient genomes comprise a second selectable marker flankedby two corresponding homology regions, optionally wherein the firstselectable marker comprises a positive selectable marker, and/or thesecond selectable marker comprises a negative selectable marker. In someembodiments the synthetic region present in the at least one partiallysynthetic recipient genomes is outside the region flanked by thehomology regions. In some embodiments the method further comprises oneor more rounds of selection for the selectable markers.

The one or more rounds of recombination-mediated genetic engineering maycomprise one or more rounds of replicon excision for enhanced genomeengineering through programmed recombination (REXER).

The synthetic genome may be a synthetic prokaryotic genome according tothe present invention.

In a further aspect the present invention provides a syntheticprokaryotic genome produced by the method of the present invention.

DESCRIPTION OF DRAWINGS

FIGS. 1A-1D—Design of the synthetic genome implementing a definedrecoding scheme for synonymous codon compression.

FIG. 1A, The defined recoding scheme for synonymous codon compression.Synonymous serine codons and three stop codons used in the genome of WTE. coli are shown. Systematically implementing a defined recoding schemefor synonymous codon compression recodes target codons to definedsynonyms, and replaces the amber stop codon TAG with the ochre stopcodon TAA. This creates an organism with a recoded genome that uses areduced number of serine and termination codons.

FIG. 1B, Refactoring of 3′, 3′ overlaps enables their independentrecoding. The overlap between two open reading frames (ORF-1 and ORF-2)is duplicated, creating a synthetic insert. This enables independentrecoding of ORFs.

FIG. 1C, Refactoring 5′, 3′ overlaps. The overlap plus 20 bp upstream isduplicated to generate a synthetic insert. When the overlap is longerthan 1 bp at the end of the upstream ORF, an in-frame TAA is introducedin the beginning of the synthetic insert; this in-frame stop codonensures termination of translation from the original RBS. Thus, allfull-length translation of the downstream ORF is initiated from thereconstructed RBS in the synthetic insert.

FIG. 1D, Map of the synthetic genome design with all TCG, TCA and TAGcodons removed. Outer ring: 18,218 positions of all TCG→AGC, TCA→AGT andTAG→TAA recoding. Grey ring: 12 positions of designed silent mutationsin overlaps, 21 refactoring of 3′, 3′ overlaps (FIG. 1B) and 58refactoring of 5′, 3′ overlaps (FIG. 1C). The two inner rings illustratethe genome sections. Outer ring: the eight genome sections (A-H) of thesynthetic genome design. Inner ring: 37 fragments of approximately 100kb each. Fragment 37 is shown as 37a and 37b to reflect the finalassembly. oriC: Origin of replication.

FIGS. 2A-2C—Retrosynthesis of the synthetic genome.

FIG. 2A, Disconnecting the genome into eight sections. The syntheticgenome was disconnected into sections A-H, with each sectioncorresponding to approximately 0.5 Mb (step 1). The position of thereplication origin oriC (orange square) is indicated. Sections wereassembled into a completely recoded genome (in the forward sense,opposite direction of retrosynthesis arrow) by directed conjugation(FIG. 10 and FIGS. 11A and 11B).

FIG. 2B, Disconnecting genome sections into 100 kb fragments. Sectionsare further disconnected into four to five fragments of around 100 kbeach. Section A is depicted, and other sections were treated similarly.Nearly all sections were constructed entirely through consecutive REXERsteps (FIG. 3 ), by GENESIS (FIG. 4 ). Each step replaced around 100 kbof wild-type genomic sequence with 100 kb of synthetic fragment (step 2and 3). Double selection markers composed of negative selection marker−1 (rpsL), and positive selection marker +1 (Kan^(R)), and a negativeselection marker −2 (SacB), and positive selection marker +2 (Cm^(R)),were used in alternating rounds of REXER to realize GENESIS.

FIG. 2C, Disconnecting each 100 kb synthetic fragment into 10 kbsynthetic stretches. Each 100 kb synthetic fragment is furtherdisconnected into 9 to 14 short synthetic stretches of around 10 kb inlength (step 4). The BACs carrying 100 kb synthetic fragments wereassembled by homologous recombination in yeast. Each BAC contains Cas9cleavage sites (black triangles) enabling excision of the synthetic DNAin vivo, homology regions (HR1 and HR2) for targeting recombination, theappropriate double selection cassette (+2,−2 indicated) for selectingduring REXER and GENESIS, a negative selection marker (−1 indicated) toenable loss of the backbone following REXER, a BAC YAC origin and URA3marker for maintenance in E. coli and S. cerevisiae.

FIG. 3 —Using 100 kb fragments of synthetic DNA to replace thecorresponding regions in the genome through REXER.

REXER (replicon excision for enhanced genome engineering throughprogrammed recombination) utilizes CRISPR/Cas9 and lambda-red mediatedrecombination to replace genomic DNA with synthetic DNA provided from anepisome (BAC). This enables large regions of the genome (>100 kb) to bereplaced by synthetic DNA (Wang, K., et al., 2016. Nature, 539(7627),59-64; WO 2018/020248). The black triangles denote the location ofCRISPR protospacers, which are cleaved by Cas9 to liberate the syntheticDNA (pink) cassette from the BAC flanked by homology regions (HRs).Homology regions 1 and 2 (HR1, HR2) program the location ofrecombination into the E. coli genome. Selection cassette −1/+1 ensuresthe integration of the synthetic DNA, while selection cassette −2/+2 onthe genome ensures the removal of the corresponding wt DNA. In theexample shown in the figure, +1 is Kan^(R), −1 is rpsL, +2 is Cm^(R), −2is sacB.

FIG. 4 —GENESIS enables the stepwise replacement of genomic DNA bysynthetic DNA to generate recoded sections.

Iterative cycles of REXER (see FIG. 3 ), with alternating choices ofpositive and negative selection cassettes, enables genome stepwiseinterchange synthesis (GENESIS) (Wang, K., et al., 2016. Nature,539(7627), 59-64). This enables large sections of the synthetic genometo be assembled through the iterative addition of fragments that replacethe corresponding genomic sequence, in a clockwise manner. The firstREXER of a 100 kb synthetic fragment of DNA leaves a −1/+1 selectioncassette on the genome which acts as a landing site for the downstreamintegration of a second fragment of synthetic DNA harbouring a −2/+2selection cassette. In the example shown, +1 is Kan^(R), −1 is rpsL, +2is Cm^(R), −2 is sacB, but the same logic can be used with differentpermutations of markers on the genome and the BAC.

FIGS. 5A-5D—Recoding ftsI-murE and map in fragment 1.

FIG. 5A, Recoding landscape of fragment 1. We sequenced six clonespost-REXER. Each dot represents the frequency of recoding within thesequenced clones (y axis) for a target codon at the indicated positionin the genome (x axis). Black dots indicate positions where we did notobserve recoding. Four codons and a refactoring of ftsI-murE and onecodon in map were rejected.

FIG. 5B, Refactoring the 14 bp ftsI-murE overlap. The codons andoverlaps are grey scaled by their post-REXER replacement frequency inthe clones sequenced. Using our initial refactoring scheme (1), in whichthe overlap plus 20 bp of upstream sequence was duplicated; we did notobserve replacement of the overlap by synthetic DNA (in the six clonessequenced post-REXER). Refactoring scheme 2, which duplicates theoverlap plus 182 bp of upstream sequence, resulted in complete recodingof this region in 12 of 16 post-REXER clones sequenced.

FIG. 5C, Testing alternative codons at Ser4 in map. A double-selectionmarker, pheS*-Hyg^(R) on a constitutive EM7 promoter, was introducedupstream of map followed by a RBS. We replaced the cassette using lineardouble stranded DNA that introduces alternative codons at position four(as indicated), via lambda red recombination and negative selection forloss of pheS*. DNA with AGC and AGT did not integrate (0/16 clones); werecovered one clone for AGC, but sequencing revealed it contained amutant AAC (Asn) codon. TCT (6/8), TCC (6/16), ACA (6/8), and TTA (4/8)were allowed.

FIG. 5D, Recoding landscape over the genomic region shown in FIG. 5Afollowing REXER with a BAC containing Refactoring scheme 2 for theftsI-murE overlap and TCT at position 4 in map. 2/7 post-REXER cloneswere completely refactored and recoded, and each target codon wasreplaced in at least 5/7 clones. The data from FIG. 5A is shown forcomparison.

FIGS. 6A-6D—Recoding me and yceQ in fragment 9.

FIG. 6A, Recoding landscape of fragment 9. Our designed, syntheticsequence of fragment 9 was integrated into the genome by REXER and 19clones were completely sequenced by NGS. The recoding landscape graphshows the frequency at which each target codon was recoded across the 19clones. While most codon replacements were accepted, recoding of a 26 kbregion was consistently rejected; codon positions with a recodingfrequency of zero in all the sequenced clones are indicated by blackdots. To pinpoint the problematic sequence, 10 kb stretches of thegenome (G2-7) were deleted in the presence of the episomal copy ofsynthetic fragment 9. The synthetic sequence was sufficient to supportdeletion of all stretches except G4 (dark grey box), suggesting that theunderlying problem is within this stretch. 0/19 clones were completelyrecoded.

FIG. 6B, Recoding landscape of stretch G4. Following REXER across the 10kb stretch ‘G4’ and sequencing of ten clones the recoding landscapeshown was generated. This revealed a clear recoding minima at yceQ, a‘gene’ that encodes a predicted protein, for which there is no evidenceof transcription, protein synthesis or homologs (Pundir, S., et al.,2017. Methods Mol Biol, 1558, 41-55). All target codons in yceQ wererecoded at least once in individual clones, but never simultaneously;thus, the minimum of the recoding landscape does not go to zero, and0/10 clones were completely recoded. This is consistent with epistasisbetween the targeted positions. In the map below the recoding landscape,sequences annotated as essential and target codons are shown. Thesequence position (x axis) is with reference to FIG. 6A.

FIG. 6C, Altered design of region surrounding me in fragment 9. Top,original design of yceQ recoding and me (encoding RNAse E) regulatorysequences. Target codons are shown. Prne1,2,3, are the promoters for theessential gene me; these are found in and around the hypothetical geneyceQ. The −10 sequence of the major promoter P1rne is mutated by ourinitial design. Sequence containing hairpin 1 (hp1) and hairpin 2 (hp2)that bind to RNAse E to mediate transcript degradation are shown; thissequence encompasses the remaining target codons and is also mutated byour initial design. Bottom, The second codon in yceQ was replaced with astop codon and the remaining target codons retained their originalsequence. The sequence position (x axis) is with reference to FIG. 6A.

FIG. 6D, This modified fragment 9, from FIG. 6C, was integrated on thegenome, resulting in complete recoding in 4/5 clones sequenced. The axesof the graph are the same as in FIG. 6A. The recoding landscape for themodified fragment 9, derived from sequencing 5 clones, is shown inpurple. The data from FIG. 6A is reproduced for comparison.

FIGS. 7A-7D—Recoding yaaY in fragment 37a.

FIG. 7A, Recoding landscape of fragment 37a. Our designed, syntheticsequence of fragment 37a was integrated into the genome by REXER and 6clones were completely sequenced by NGS. While most codon replacementswere accepted, recoding of a 6.5 kb region was consistently rejected.Target codon positions that were never recoded in the six clonessequenced are indicated by black dots.

FIG. 7B, Identification of the problematic target codon. Within theidentified 6.5 kb problematic region we first focused on codons inessential genes (dark grey arrows) over non-essential genes (light greyarrows). Sanger sequencing (black bar) of 24 clones showed that 2 cloneswere recoded in all 6 target codons within a sub-section of theessential genes. Further Sanger sequencing of the remaining targetcodons in essential genes in these two clones revealed that 1 clone wasrecoded at all 17 target codons. This clone was completely sequenced byNGS and used to generate a recoding landscape, in which each targetcodon is either recoded or not recoded. This allowed us, in combinationwith the recoding landscape in FIG. 7A, to identify a problematic region1.8 kb upstream of ribF. Here we focused on the 4 target codons in thegenes rpsT and yaaY as the nearest codons to the essential ribF gene.Sanger sequencing of 33 clones across this sequence revealed only 1codon that was never recoded, the codon for Ser70 in the hypotheticalgene yaaY (sequencing results are shown as grey scaled on the gene mapof rspT and yaaY). We therefore investigated alternative codonreplacements in yaaY.

FIG. 7C, Alternative codon replacement in the hypothetical gene yaaY. Atposition Ser70 in this gene, replacement of TCA with AGT was notsuccessful. To investigate alternative codon replacement schemes, adouble-selection marker, pheS*-Hyg^(R) on a constitutive EM7 promoterfollowed by an RBS was introduced into yaaY12 bp upstream of the codonfor Ser70. The negative selection marker was then used to select forclones that had replaced the cassette using linear double stranded DNAthat introduces alternative codons at position seventy, via lambda redrecombination. While linear double stranded DNA with AGT did notintegrate (0/16 clones) integration of dsDNA with TCC (2/16), TCG(2/16), TCT (6/16) and AGC (9/16) proved viable.

FIG. 7D, Recoding landscape of REXER with a BAC containing a correctedversion of fragment 37a, bearing AGC at position Ser70 in thehypothetical gene yaaY. When integrated by REXER, we identified 1/7completely recoded clones. AGC at position Ser70 in yaaYwas introducedin 4/7 clones.

FIGS. 8A and 8B—Substitutions in the hypothetical gene yceQ overlap withregulatory elements in me that encodes the essential protein RNAse E.

FIG. 8A, In our original design, a programmed substitution of a TCA toAGT in the hypothetical gene yceQ leads to mutation of the −10 promoterelement of P1rne, (boxed). The transcriptional start site (tss) of thispromoter, for me transcription, is indicated by an arrow; this is themajor promoter for me transcription.

FIG. 8B, Target codon substitutions overlap with and may potentiallydisrupt the key regulatory hairpins hp2 and hp3 in the long 5′ UTR ofthe me transcript. hp2 and hp3 mediate the regulatory feedback loop inwhich RNAse E is recruited to the mRNA to promote degradation of its owntranscript. Shown is a schematic of the wild-type secondary structure ofthe me 5′ UTR (Diwa, A., et al., 2000 Genes Dev 14, 1249-1260). Thetarget codons for synonymous replacement are highlighted.

FIGS. 9A and 9B—Completing Sections A-B and H.

FIG. 9A, GENESIS was initiated with fragment 4 and proceeded smoothlyuntil fragment 9, in which we were unable to recode yceQ. Identifyingand fixing the problems with our initial design of fragment 9 wascarried out as described in FIGS. 6A-6D, by means of introducing a stopcodon at the start of the predicted yceQ ORF. Following a swap of thesacB-Cm^(R) (sC) double selection cassette at the end of fragment 9 fora pheS*-Hyg^(R) (pH) double selection cassette this strain was ready toact as the recipient for conjugation to assemble a strain in whichfragments 4-13 (sections A+B) are fully recoded. In parallel, wecontinued to recode the strain containing the recoded fragments 4 toincomplete fragment 9 by GENESIS; this generated a second strain forassembly in which fragments 4-8 and 10-13 were completely recoded, andfragment 9 was partially recoded. We then integrated oriT3 kb upstreamof the start of fragment 10 in the second strain to generate a donor forconjugation to assemble a strain in which fragments 4-13 (sections A+B)are fully recoded. Conjugation of the donor and recipient strainsresulted in a strain in which sections A and B are fully recoded.

FIG. 9B, Individual REXER of fragments 37a and 1 led to incompleterecoding. We carried out troubleshooting of both independently (FIGS.5A-5D, FIGS. 7A-7D). The repairs are indicated. Each strain then servedas a starting point for two independent sets of GENESIS—one generated37a-37b (on the left) and ended with an rpsL-Kan^(R) (rK) cassette andone generated 1-3 (on the right) and ending in a sacB-Cm^(R) cassette.We integrated an oriT3 kb upstream of the start of fragment 1, and thisstrain served as a donor for the directed conjugation of 1-3 into37a-37b. The correct product was selected for by the gain of Cm^(R) andthe loss of rpsL. This resulted in the completion of section H in asingle strain.

FIG. 10 —Assembly of an organism with a fully synthetic genome viaconjugation of recoded genome sections.

Synthetic genomic sections from multiple, individual partially-recodedgenomes were assembled into a single, fully-recoded genome viaconjugation (Ma, N. J., et al., 2014. Nat Protoc 9, 2285-2300). Thedonor (d) and recipient (r) strains harbour unique recoded genomicsections; recoded overlapping homology regions (3 kb to 400 kb) wereutilized to seamlessly recombine the strains. Small homology regionsranging from 3-5 kb are denoted with an asterisk (*). Conjugations forwhich we used greater than 5 kb homology (HR) are indicated with text.For assembly, the recoded genomic content from the donor was conjugatedin a clockwise manner to replace the corresponding wt genomic section inthe recipient. The origin of strain AB and H is described in detail inFIGS. 9A and 9B, while all other individual synthetic genomes weregenerated by GENESIS (FIG. 4 ). Conjugation followed by recombinationproceeded until the final, fully-recoded, A-H strain was assembled andsequence verified by NGS sequencing.

FIGS. 11A and 11B—Assembly of recoded genome sections into afully-recoded organism.

FIG. 11A, Schematic assembly of partially synthetic donor and recipientgenomes into a more synthetic genome, through conjugation. In therecipient cell, the recoded genome section is extended with recoded DNA,commonly 3-4 kb, by a lambda red mediated recombination and positive andnegative selection; this step takes advantage of the genomic markers atthe end of the recoded sequence that are introduced by GENESIS, andprovides a homology region with the end of the recoded fragment in thedonor strain. The donor strain is prepared by integration of an originof transfer (oriT) at the end of the recoded DNA. The indicated positiveand negative selections ensure the survival of recipient strains, andselect for recipients that have successfully integrated the syntheticDNA from the donor. An F′ plasmid containing a mutation in the oriTsequence that makes it non-transferrable was used to facilitateconjugation of the donor genome to the recipient. +2, Cm^(R); −2, SacB;+3, Hyg^(R); −3, pheS*; +4 Gentamycin^(R); +5, Tetracycline^(R).

FIG. 11B, Synthetic genomic sections from multiple, individualpartially-recoded genomes were assembled into a single, fully-recodedgenome via the indicated sequence of conjugations. The donor (d) andrecipient (r) strains harbor unique recoded genomic sections. Therecoded genomic content from the donor was conjugated in a clockwisemanner to replace the corresponding WT genomic section in the recipient.Conjugation proceeded until the final, fully-recoded A-H strain wasassembled. FIG. 10 shows the process in more detail, including allhomology regions.

FIGS. 12A-12C—Functional consequences of synonymous codon compression inSyn61.

FIG. 12A, Synonymous codon compression and deletion of prfA, serU andserT. The grey box shows the serine codons and stop codons, togetherwith the tRNAs and release factors that decode them in WT E. coli(WTgenome). tRNA anticodons and release factors are connected to the codonsthey read by black lines. The tRNA and release factor genes are shown inthe black boxes. serT is the sole tRNA that decodes TCA codons in WT E.coli, and is absolutely essential. Synonymous codon compression (Syn.Codon. Comp.) leads to a recoded genome in which i) tRNAs with CGAanticodons should have no cognate codons and ii) serT should bedispensable. All factors that read the target codons should bedispensable in Syn61.

FIG. 12B, Co-translational incorporation of the non canonical amino acid(ncAA) Nε-(((2-methylcycloprop-2-en-1-yl) methoxy) carbonyl)-L-lysine(CYPK), using the orthogonal MmPylRS/tRNA^(Pyl) _(CGA) pair, was toxicin MDS42 but not Syn61. When provided with CYPK, this pair willincorporate the ncAA in response to TCG codons in a dose dependentmanner. In MDS42 this incorporation leads to mis-synthesis of theproteome and toxicity. However, in Syn61, which does not contain TCGcodons, this is non-toxic. The lines follow the mean of three biologicalreplicates (each shown as a dot) at each [CYPK] (0 mM, 0.5 mM, 1 mM, 2.5mM and 5 mM). “% Max Growth” was determined by the final OD₆₀₀ with theindicated concentration of CYPK divided by the final OD₆₀₀ in theabsence of CYPK. Final OD₆₀₀s were determined after 600 min.

FIG. 12C, Synonymous codon compression enables deletion of serT inSyn61. PCR flanking the serT locus before (−) and after (clones 1 and 2)replacement with a PheS*-Hyg^(R) cassette. Also see FIGS. 14A-14F. Fullgels in FIG. 16 .

FIGS. 13A-13D—Characterization of an organism with a fully syntheticgenome.

FIG. 13A, Doubling times for Syn61 and MDS42. Our fully synthetic,recoded E. coli Syn61 has a doubling time 1.6 times higher than that ofthe parent strain MDS42 (Posfai, G. et al., 2006. Science 312,1044-1046) when grown in standard media conditions (90.1 min vs. 57.6min in LB+2% glucose). The ratio of growth rates between Syn61 and MDS42in LB (decreased carbon catabolite repression) at 37° C. is 1.7, in M9minimal media is 1.7, in richer media (2×TY) is 1.4, in LB at 25° C. is2.5, and in LB at 42° C. is 1.3. Listed are the doubling times for MDS42and Syn61, respectively, in different media conditions: LB at 37° C.,58.3 min, and 100.6 min; LB+2% Glucose, 57.6 min, and 90.1 min; M9minimal media, 130.5 min, and 221.1 min; 2×TY, 68.2 min, 92.6 min; LB at25° C., 86.3 min, and 218.4 min; LB at 42° C., 77.4 min, and 99.7 min.Syn61 harboring a plasmid without (−) or with (+) serVexhibited a growthrate ratio of 0.99 (138.3 min vs. 136.2 min). Doubling times representthe average of ten independently grown biological replicates of eachstrain ±standard deviation from the mean (see Methods).

FIG. 13B, Representative microscopy images of E. coli strain MDS42 andSyn61. Samples were imaged on an upright Zeiss Axiophot phase contrastmicroscope using a 63×1.25NA Plan Neofluar phase objective (seeMethods).

FIG. 13C, Histogram of cell lengths quantified from microscopy images ofstrains MDS42 and Syn61. The mean cell length for MDS42 was 1.97±0.57 μmand for Syn61 was 2.3±0.74 μm. Images of n=500 cells were taken duringexponential growth phase for both strains. Cell length measurements weremade with Nikon NIS Elements software (see Methods).

FIG. 13D, Label-free quantification of the MDS42 and Syn61 proteomes.Each strain was grown in three biological replicates. Each biologicalreplicate was analysed by tandem mass spectrometry in technicalduplicate. Technical duplicates of biological replicates were merged. Atotal of 1,084 proteins were quantified across the samples. P-values forabundance differences were calculated by two-sample T-test for theproteins quantified in at least two biological replicates. The datashowed that the abundance of three proteins was significantly (P=0.01)different between the strains: Aminopeptidase N (P04825) and peptidase T(P29745) were overrepresented in Syn61, while 30S ribosomal protein S20(POA7U7) was underrepresented. No protein differed in abundance, asjudged by LFQ values, by more than 1.14 fold between strains.

FIGS. 14A-14F—Consequences of synonymous codon compression in Syn61.

FIG. 14A, Synonymous codon compression and deletion of prfA, serUandserTin E. coli. The grey box shows the E. coli serine codons and stopcodons, together with the tRNAs and release factors that decode them inWT E. coli (WT genome). tRNA anticodons and release factors areconnected to the codons they read by black lines. The tRNA and releasefactor genes are shown in the black boxes. Synonymous codon compression(Syn. Codon. Comp.) leads to Syn61 cells with a recoded genome in whichTCG and TCA codons are removed. The abundance of each codon is listed inits box.

FIG. 14B, As in FIG. 12B, but with the indicated MmPylRS/tRNA^(Pyl)anticodon, UGA. There are less cognate codons to this tRNA in Syn61 thanin MDS42, therefore CYPK addition might be expected to be less toxic inSyn61, as observed.

FIG. 14C, As in FIG. 12B, but with the indicated MmPylRS/tRNA^(Pyl)anticodon, GCU. There are more cognate codons to this tRNA in Syn61 thanin MDS42, therefore CYPK addition might be expected to be more toxic inSyn61, as observed.

FIG. 14D, serT (dark grey) is deleted by insertion of a PheS*-Hyg^(R)cassette (black) via lambda-red mediated recombination. Recombinationyields new junctions 1 and 2, as indicated. For each recombination, bothjunctions were sequence-verified by Sanger sequencing. Above the Sangerchromatograms, the arrows indicate the precise location of the junction,the sequence corresponding to the selection cassette and the barcorresponds to the genomic sequence flanking the selection cassette. Theprimers used to generate selection cassettes with suitable homologies toserU, serT and prfA for recombination are provided in FIG. 23 .

FIG. 14E, prfA (dark grey) is deleted by insertion of an rpsL-Kan^(R)(in black) via lambda-red mediated homologous recombination. The agarosegels are annotated as described in FIG. 12C and the rest of the data isannotated as described in FIG. 14D. Full gel available in FIG. 16 .

FIG. 14F, serU (dark grey) is deleted by insertion of a PheS*-Hyg^(R)cassette (in black) via lambda-red mediated recombination. The agarosegels are annotated as described in FIG. 12C and the rest of the data isannotated as described in FIG. 14D. Full gel available in FIG. 16 .

FIGS. 15A-15C—The scale of genome synthesis and scale and fidelity ofrecoding.

FIG. 15A, Genome and chromosome synthesis. The size (Mb) of syntheticgenomes that have been produced for M. genitalium and M. mycoides(Gibson, D. G. et al., 2008. Science 319, 1215-1220; and Gibson, D. G.et al., 2010. Science 329, 52-56) and several S. cerevisiae chromosomes(Shen, Y. et al., 2017. Science 355, aaf4791; Annaluru, N. et al., 2014.Science 344, 55-58; Xie, Z. X. et al., 2017. Science 355, aaf4704;Mitchell, L. A. et al., 2017. Science 355, aaf4831; Dymond, J. S. etal., 2011. Nature 477, 471-476; Wu, Y. et al., 2017. Science 355,aaf4706; Zhang, W. et al., 2017. Science 355, aaf3981; and Richardson,S. M. et al., 2017. Science 355, 1040-1044) are shown in light grey. Thesize of the synthetic E. coli genome presented here is shown in darkgrey.

FIG. 15B, Genome recoding efforts. Attempts to recode target codons TTAand TTG in S. typhimurium (Lau, Y. H. et al., 2017. Nucleic Acids Res45, 6971-6980); AGC, AGT, TTG, TTA, AGA, AGG, and TAG in E. coli(Ostrov,N. et al., 2016. Science 353, 819-822); AGA and AGG in E.coli(Napolitano, M. G. et al., 2016. Proc Natl Acad Sci USA 113,E5588-5597), as well as recoding of all TAG in E. coli (Lajoie, M. J. etal., 2013. Science 342, 357-360) are shown in light grey. Compared toremoval of all TCA, TCG, and TAG in E. coli presented here (dark grey).The total number of codons recoded in a single strain are shown on thegraph, and the maximum percentage of target codons recoded in a singlestrain in each effort is indicated.

FIG. 15C, Number of reported non-programmed mutations and indels as afunction of the number of target codons recoded for the experimentsshown in FIG. 15B.

FIG. 16 —Full gels for FIGS. 12A-12C Full gels are shown withcorresponding Figure panel. The molecular size standards are annotatedand the area shown in the relevant Figure is indicated by a whiteoutline.

FIG. 17 —Codon and anticodon interactions in the E. coli genome

28 sense codons are highlighted in grey, along with the amber stopcodon. The genome wide removal of these sense codons, but not othersense codons, would enable all their cognate tRNA to be deleted withoutremoving the ability to decode one or more sense codons remaining in thegenome. This is necessary but not sufficient for the reassignment ofsense codons to unnatural monomers. Serine, leucine and alanine codonboxes are highlighted because the endogenous aminoacyl-tRNA synthetasesfor these amino acids do not recognize the anticodons of their cognatetRNAs. This may facilitate the assignment of codons within these boxesto new amino acids through the introduction of tRNAs bearing cognateanticodons that do not direct mis-aminocylation by endogenoussynthetases. The number of total codon counts for all 64 triplet codonsin the MDS42 genome (GenBank accession no. AP012306), all knowncodon-anticodon interactions through both Watson-Crick base-paring andwobbling, base modification on tRNA anticodons, tRNA genes, and measuredin vivo tRNA relative abundance are reported. This analysis identifies10 codons from the serine, leucine, and alanine groups (serine codonTCG, TCA, AGT, AGC; leucine codon CTG, CTA, TTG, TTA; and alanine codonGCG, GCA) satisfy both the codon-anticodon interaction andaminoacyl-tRNA synthetases recognition criteria for codon reassignment.

FIG. 18 —Designed synthetic E. coli genome (SEQ ID NO: 1)

A version of the E. coli MDS42 genome in which the serine codons TCG andTCA and the stop codon TAG in open reading frames (ORFs) aresystematically replaced by their synonyms AGC, AGT, and TAA,respectively. Using the defined rules for synonymous codon compressionand refactoring a genome is designed in which all 18,218 target codonsare recoded to their target synonyms.

FIG. 19 —Final synthetic E. coli genome (Syn61) (SEQ ID NO: 2)

Sequence of E. coli Syn61, in which all 1.8×10⁴ target codons in thegenome are recoded. The synthesis of our recoded genome introduced onlyeight non-programmed mutations (Table 6), four of these mutations aroseduring the preparation of the 100 kb BACs, and four during the recodingprocess.

FIGS. 20A-20D—BACs for assembling synthetic genome

FIG. 20A, BAC-sacB-CmR-rpsL. The nucleotide sequence for an annotatedBAC vector harbouring a sacB-CmR selection cassette flanked upstream bya 5′ homology region (HR) and CRISPR/Cas9 protospacer sequence (spacer1). The sacB-CmR cassette is flanked downstream by a 3′ homology region,a CRISPR/Cas9 protospacer sequence (spacer 2), and an rpsL selectionmarker.

FIG. 20B,—BAC-rpsL-KanR-sacB. The nucleotide sequence for an annotatedBAC vector harbouring an rpsL-KanR selection cassette flanked upstreamby a 5′ homology region (HR) and CRISPR/Cas9 protospacer sequence(spacer 1). The rpsL-KanR cassette is flanked downstream by a 3′homology region, a CRISPR/Cas9 protospacer sequence (spacer 2), and asacB selection marker.

FIG. 20C, BAC-rpsL-KanR-pheS*-HygR. The nucleotide sequence for anannotated BAC vector harbouring an rpsL-KanR selection cassette flankedupstream by a 5′ homology region (HR) and CRISPR/Cas9 protospacersequence (spacer 1). The rpsL-KanR cassette is flanked downstream by a3′ homology region, a CRISPR/Cas9 protospacer sequence (spacer 2), and apheS*-HygR selection marker.

FIG. 20D, Table of BAC Construction. Oligonucleotides and selectionmarkers used to construct BACs with synthetic DNA for REXER and homologyregions between synthetic DNA fragments. The second tab lists theplasmid backbone and protospacer sequences used for REXER.

FIGS. 21A and 21B—Exemplary spacer plasmid maps

FIG. 21A, Spacer plasmid map. Exemplary map ofpKW1_MB1amp_Spacers_REXER2 containing the CRISPR insert with spacersequences used as linear or circular spacers for REXER.

FIG. 21B, Second generation spacer plasmid map. Exemplary map ofpKW3_MB1amp_Spacers_REXER2 containing the CRISPR insert with spacersequences used as circular 2nd generation spacers for REXER.

FIGS. 22A-22C—Constructs for conjugation FIG. 22A, Gentamycin resistanceOriT cassette.

FIG. 22B, Primers for conjugation constructs. Oligonucleotide primersused for conjugation.

FIG. 22C, pJF146. F′ plasmid that does not self-transfer.

FIG. 23 —Primers for deletion experiments Oligonucleotide primers usedfor deletion of the tRNAs serT and serUand release factor prfA in Syn61.

DETAILED DESCRIPTION

The terms “comprising”, “comprises” and “comprised of” as used hereinare synonymous with “including” or “includes”; or “containing” or“contains”, and are inclusive or open-ended and do not excludeadditional, non-recited members, elements or steps. The terms“comprising”, “comprises” and “comprised of” also include the term“consisting of”.

Synthetic Genomes

Genomes

As used herein, a “genome” is the genetic material of an organism,including both genes and non-coding DNA. As used herein, a “syntheticgenome” is a synthetically-built genome. Typically a synthetic genomewill be produced by genetic modification of a pre-existing (i.e.“parent”) genome. Thus, a synthetic genome may be derived from a parentgenome, i.e. identical to a parent genome, except comprising one or moregenetic modifications. The skilled person will be able to readilyidentify the parent genome on which a synthetic genome is based and thegenetic modifications carried out. As used herein, a “parent genome” maybe any naturally-occurring, commercially-available, deposited,catalogued or otherwise well-known genome, or derivative thereof.

The synthetic genome of the present invention is a synthetic prokaryoticgenome. A prokaryote is a unicellular organism that lacks amembrane-bound nucleus, mitochondria, or any other membrane-boundorganelle. Prokaryotes are divided into two domains, Archaea andBacteria. The genome of prokaryotic organisms generally is a circular,double-stranded piece of DNA, multiple copies of which may exist at anytime.

Preferably, the synthetic genome of the present invention is a syntheticbacterial genome. Preferably the synthetic bacterial genome is suitablefor heterologous protein production, in particular the production ofpolypeptides comprising one or more non-proteinogenic amino acids (forinstance those described by Ferrer-Miralles, N. and Villaverde, A.,2013. Microbial Cell Factories, 12:113). Suitable bacterial genomesinclude: escherichia (e.g. Escherichia coli), caulobacteria (e.g.Caulobacter crescentus), phototrophic bacteria (e.g. Rodhobactersphaeroides), cold adapted bacteria (e.g. Pseudoalteromonashaloplanktis, Shewanella sp. strain Ac10), pseudomonads (e.g.Pseudomonas fluorescens, Pseudomonas putida, Pseudomonas aeruginosa),halophilic bacteria (e.g. Halomonas elongate, Chromohalobactersalexigens), streptomycetes (e.g. Streptomyces lividans, Streptomycesgriseus), nocardia (e.g. Nocardia lactamdurans), mycobacteria (e.g.Mycobacterium smegmatis), coryneform bacteria (e.g. Corynebacteriumglutamicum, Corynebacterium ammoniagenes, Brevibacteriumlactofermentum), bacilli (e.g. Bacillus subtilis, Bacillus brevis,Bacillus megaterium, Bacillus licheniformis, Bacillusamyloliquefaciens), and lactic acid bacteria (e.g. Lactococcus lactis,Lactobacillus plantarum, Lactobacillus casei, Lactobacillus reuteri,Lactobacillus gasseri) genomes. In some embodiments the synthetic genomeis a synthetic gram-negative bacterial genome.

Bacterial genomes can range in size anywhere from about 130 kb to over14 Mb. Thus, in some embodiments the synthetic prokaryotic genome of thepresent invention is 100 kb to 20 Mb, or 130 kb to 15 Mb, or 200 kb to15 Mb, or 300 kb to 15 Mb, or 500 kb to 15 Mb, or 1 Mb to 15 Mb, or 1 Mbto 10 Mb, or 1 Mb to 8 Mb, or 1 Mb to 6 Mb, or 2 Mb to 6 Mb, or 2 Mb to5 Mb, or 3 Mb to 5 Mb, or about 4 Mb in size. The synthetic prokaryoticgenome may comprise 100 or more, 200 or more, 300 or more, 400 or more,500 or more, 600 or more, 700 or more, 800 or more, 900 or more, 1000 ormore, 1500 or more, or 2000 or more genes, preferably 1000 or moregenes. The synthetic prokaryotic genome may comprise 100 or more, 200 ormore, 300 or more, 400 or more, 500 or more, 600 or more, 700 or more,800 or more, 900 or more, 1000 or more, 1500 or more, or 2000 or moregenes for which there is evidence of translation and/or of the predictedprotein product, preferably 1000 or more genes. Preferably the syntheticprokaryotic genome comprises 100 or more, 200 or more, 300 or more, 400or more, 500 or more essential genes, preferably 300 or more essentialgenes.

Preferably, the synthetic genome of the present invention is a syntheticEscherichia coli, Salmonella enterica, or Shigella dysenteriae genome.These are phylogenetically related species as disclosed by Lukjancenko,O., et al., 2010. Microbial ecology, 60(4), pp. 708-720; and Karberg, K.A., et al., 2011. PNAS, 108(50), pp. 20154-20159.

More preferably, the synthetic genome of the present invention is asynthetic E. coli genome. The parent genome may be any suitable E. coligenome including MDS42, K-12, MG1655, BL21, BL21(DE3), AD494, Origami,HMS174, BLR(DE3), HMS174(DE3), Tuner(DE3), Origami2(DE3), Rosetta2(DE3),Lemo21(DE3), NiCo21(DE3), T7 Express, SHuffle Express, C41(DE3),C43(DE3), and m15 pREP4 or derivatives thereof (Rosano, G. L. andCeccarelli, E. A., 2014. Frontiers in microbiology, 5, p. 172). Mostpreferably, the parent genome is MDS42, MG1655, or BL21 or a derivativethereof. MG1655 is considered as the wild type strain of E coli. TheGenBank ID of genomic sequence of this strain is U00096. BL21 is widelyavailable commercially. For example, it can be purchased from NewEngland BioLabs with catalog number C2530H(https://www.neb.com/products/c2530-bl21-competent-e-coli).

In some embodiments the synthetic genome is a reduced synthetic genomeor a minimal synthetic genome. A “reduced genome” is one in which thesize of the parent genome has been reduced by removing non-essentialgenes and/or non-coding regions. A “minimal genome” is a genome whichhas been reduced to its minimal size whilst remaining viable e.g. bydeletion of all non-essential regions of the genome.

The synthetic genome of the present invention may be a viable genome. Asused herein, a “viable genome” refers to a genome that contains nucleicacid sequences sufficient to cause and/or sustain viability of a cell,e.g., those encoding molecules required for replication, transcription,translation, energy production, transport, production of membranes andcytoplasmic components, and cell division.

Preferably one or more tRNA or release factors may be deleted from thesynthetic genome and the synthetic genome may remain viable. Forexample, a tRNA which decodes only the one or more sense codons thathave been replaced (or deleted) may be dispensable. Similarly, a tRNAwhich decodes the one or more sense codons that have been replaced (ordeleted) may be dispensable if the remaining sense codons that itdecodes may also be decoded by an alternative tRNA. For example, serT,encoding tRNA^(Ser) _(UGA), is the only tRNA that decodes TCA codons inE. coli, and is therefore normally essential. However, if the syntheticgenome does not contain TCA codons then serT may be dispensable.

Sense Codons

The current invention provides a synthetic prokaryotic genome comprising5 or fewer occurrences of one or more sense codons; and/or a syntheticprokaryotic genome derived from a parent genome, wherein the syntheticprokaryotic genome comprises less than 10%, 5%, 2%, 1%, 0.5%, 0.1% ofthe occurrences of one or more sense codons, relative to the parentgenome; and/or a synthetic prokaryotic genome comprising 100 or more,200 or more, or 1000 or more genes with no occurrences of one or moresense codons.

The one or more sense codons may consist of one, two, three, four, five,six, seven, or eight sense codons. Preferably, the one or more sensecodons consist of one sense codon or two sense codons, most preferablytwo sense codons.

The synthetic prokaryotic genome may comprise 5 or fewer (e.g. 5, 4, 3,2, 1), or no occurrences of one or more (e.g. 1, 2, 3, 4, 5, 6, 7, or 8)sense codons. In some embodiments the synthetic prokaryotic genomecomprises 5 or fewer (e.g. 5, 4, 3, 2, 1, 0) of each of the one or more(e.g. 1, 2, 3, 4, 5, 6, 7, or 8) sense codons. In other embodiments thesynthetic prokaryotic genome comprises 5 or fewer (e.g. 5, 4, 3, 2, 1,0) of the one or more (e.g. 1, 2, 3, 4, 5, 6, 7, or 8) sense codonscombined (i.e. in total). In preferred embodiments the syntheticprokaryotic genome comprises no occurrences of one sense codon. In otherpreferred embodiments the synthetic prokaryotic genome comprises nooccurrences of two sense codons.

The synthetic prokaryotic genome may be derived from a parent genome andcomprise 5 or fewer (e.g. 5, 4, 3, 2, 1), or no occurrences of one ormore (e.g. 1, 2, 3, 4, 5, 6, 7, or 8) native sense codons. In someembodiments the synthetic prokaryotic genome comprises 5 or fewer (e.g.5, 4, 3, 2, 1, 0) of each of the one or more (e.g. 1, 2, 3, 4, 5, 6, 7,or 8) native sense codons. In other embodiments the syntheticprokaryotic genome comprises 5 or fewer (e.g. 5, 4, 3, 2, 1, 0) of theone or more (e.g. 1, 2, 3, 4, 5, 6, 7, or 8) native sense codonscombined (i.e. in total). In preferred embodiments the syntheticprokaryotic genome is derived from a parent genome and comprises nooccurrences of one native sense codon. In other preferred embodimentsthe synthetic prokaryotic genome is derived from a parent genome andcomprises no occurrences of two native sense codons.

In some embodiments the synthetic prokaryotic genome comprises 100 ormore, 200 or more, 300 or more, 400 or more, 500 or more, 600 or more,700 or more, 800 or more, 900 or more, 1000 or more, 1500 or more, or2000 or more genes, preferably 1000 or more genes. In some embodimentsthe genes are those for which there is evidence of translation and/or ofthe predicted protein product. For example, the synthetic prokaryoticgenome may comprise 100 or more, 200 or more, 300 or more, 400 or more,500 or more 600 or more, 700 or more, 800 or more, 900 or more, 1000 ormore, 1500 or more, or 2000 or more genes, preferably 1000 or more genesfor which there is evidence of translation and/or of the predictedprotein product. Preferably the synthetic prokaryotic genome comprises100 or more, 200 or more, 300 or more, 400 or more, 500 or moreessential genes, preferably 300 or more essential genes. Preferably the(essential) genes have no occurrences of the one or more sense codons.

The synthetic prokaryotic genome may comprise less than 10%, 5%, 2%, 1%,0.5%, 0.1% of the occurrences of one or more (e.g. 1, 2, 3, 4, 5, 6, 7,or 8) sense codons, relative to the parent genome. In some embodimentsthe synthetic prokaryotic genome comprises less than 10%, 5%, 2%, 1%,0.5%, 0.1% of the occurrences of each of the one or more (e.g. 1, 2, 3,4, 5, 6, 7, or 8) sense codons, relative to the parent genome. In otherembodiments the synthetic prokaryotic genome comprises less than 10%,5%, 2%, 1%, 0.5%, 0.1% of the occurrences of the one or more (e.g. 1, 2,3, 4, 5, 6, 7, or 8) sense codons combined, relative to the parentgenome. In preferred embodiments the synthetic prokaryotic genomecomprises less than 10%, 5%, 2%, 1%, 0.5%, 0.1% of one sense codon,relative to the parent genome. In other preferred embodiments thesynthetic prokaryotic genome comprises less than 10%, 5%, 2%, 1%, 0.5%,0.1% of two sense codons, relative to the parent genome.

The synthetic prokaryotic genome may comprise 100 or more, 200 or more,or 1000 or more genes with no occurrences of one or more (e.g. 1, 2, 3,4, 5, 6, 7, or 8) sense codons. Preferably, all or substantially all thegenes in the synthetic prokaryotic genome have no occurrences of the oneor more (e.g. 1, 2, 3, 4, 5, 6, 7, or 8) sense codons. In preferredembodiments, all or substantially all the genes in the syntheticprokaryotic genome have no occurrences of one sense codon. In otherpreferred embodiments, all or substantially all the genes in thesynthetic prokaryotic genome have no occurrences of two sense codons. Bysubstantially all is meant all but 10 or fewer (e.g. 10, 9. 8, 7, 6, 5,4, 3, 2, 1, or 0) genes comprise occurrences of the one or more sensecodons.

The synthetic prokaryotic genome may comprise 100 or more, 200 or more,or 1000 or more genes with no occurrences of one or more (e.g. 1, 2, 3,4, 5, 6, 7, or 8) native sense codons. Preferably, all or substantiallyall the genes in the synthetic prokaryotic genome have no occurrences ofthe one or more (e.g. 1, 2, 3, 4, 5, 6, 7, or 8) native sense codons. Inpreferred embodiments, all or substantially all the genes in thesynthetic prokaryotic genome have no occurrences of one native sensecodon. In other preferred embodiments, all or substantially all thegenes in the synthetic prokaryotic genome have no occurrences of twonative sense codons. By substantially all is meant all but 10 or fewer(e.g. 10, 9. 8, 7, 6, 5, 4, 3, 2, 1, or 0) genes comprise occurrences ofthe one or more native sense codons.

Preferably the genes encode proteins (e.g. the genes are those for whichthere is evidence of translation and/or of the predicted proteinproduct) and/or the genes are essential genes. Thus, in more preferredembodiments the synthetic prokaryotic genome comprises 100 or more, 200or more, or 1000 or more protein-encoding and/or 100 or more, 200 ormore, or 300 or more essential genes with no occurrences of one or twosense codons. In other more preferred embodiments all or substantiallyall the protein-encoding and/or essential genes in the syntheticprokaryotic genome comprise no occurrences of one or two sense codons.

In preferred embodiments no proteins are translated from any of theremaining occurrences of the one or more sense codons and/or genescomprising the remaining occurrences of the one or more sense codons areputative or non-coding genes. In some embodiments the translation of thegenes comprising the remaining occurrences of the one or more sensecodons is reduced and/or prevented (e.g. the genes may comprise stopcodons in the 5′ sequence).

Any remaining occurrences of the sense codons may be necessary to ensurethat the synthetic prokaryotic genome is viable. For example, one ormore, preferably all, of the remaining occurrences of the one or moresense codons in the synthetic prokaryotic genome may be present in theregulatory elements of essential genes; and/or one or more, preferablyall, of the remaining occurrences of the one or more sense codons may bein genes in which there is no evidence for translation or the predictedprotein product (i.e. putative or non-coding genes).

As used herein, a “sense codon” is a nucleotide triplet that codes foran amino acid. Thus, sense codons may be identified in a genome by geneprediction, i.e. by identifying regions of the genome that code forproteins (i.e. genes) and the corresponding open reading frames (ORFs).Typically, genomes naturally comprise 61 sense codons: GCT, GCC, GCA,GCG, CGT, CGC, CGA, CGG, AGA, AGG, AAT, AAC, GAT, GAC, TGT, TGC, CAA,CAG, GAA, GAG, GGT, GGC, GGA, GGG, CAT, CAC, ATT, ATC, ATA, TTA, TTG,CTT, CTC, CTA, CTG, AAA, AAG, ATG, TTT, TTC, CCT, CCC, CCA, CCG, TCT,TCC, TCA, TCG, AGT, AGC, ACT, ACC, ACA, ACG, TGG, TAT, TAC, GTT, GTC,GTA, and GTG (read from 5′ to 3′ on the coding strand of DNA). Thestandard genetic code encodes the 20 canonical amino acids using the 61triplet codons. 18 of the 20 amino acids are encoded by more than onesynonymous codon (see FIG. 17 ). The one or more sense codons may be oneor more native sense codons, i.e. sense codons which are present in theparent genome.

The 61 sense codons in DNA are transcribed into corresponding mRNA andsubsequently decoded by one or more tRNAs. tRNAs carry an amino acid toa ribosome as directed by the sense codons in the mRNA. The tRNAs canrecognise one or more sense codons via a complementary anticodon. Asequence of sense codons is subsequently translated into a polypeptide(i.e. a sequence of amino acids). Codon and anticodon interactions inthe E. coli genome are shown in FIG. 17 .

Preferably, the genome wide removal of the one or more sense codons, butnot other sense codons, enables all the cognate tRNA corresponding tosaid one or more sense codons to be deleted without removing the abilityto decode the one or more sense codons remaining in the genome. Thus,the one or more sense codons may be selected from: TCG, TCA, AGT, AGC,GCG, GCA, GTG, GTA, CTG, CTA, TTG, TTA, ACG, ACA, CCG, CCA, CGG, CGA,CGT, CGC, AGG, AGA, GGG, GGA, GGT, GGC, ATT, and ATC.

Aminoacyl-tRNA synthetases for serine, leucine and alanine do notrecognize the anticodons of their cognate tRNAs. This may facilitate theassignment of codons within these boxes to new amino acids through theintroduction of tRNAs bearing cognate anticodons that do not directmis-aminocylation by endogenous synthetases. Thus, the one or more sensecodons may be selected from: TCG, TCA, TCT, TCC, AGT, AGC, GCG, GCA,GCT, GCC, CTG, CTA, CTT, CTC, TTG, and TTA.

Preferably, the one or more sense codons fulfill both these criteria,thus the one or more sense codons may be selected from: TCG, TCA, AGT,AGC, GCG, GCA, CTG, CTA, TTG, and TTA. More preferably, the one of moresense codons are selected from TCG, TCA, AGT, AGC, TTG, TTA, GCG andGCA. Most preferably, the one of more sense codons are TCG and/or TCA.

Preferably, one or more sense codons are removed such that the genome iscompatible with codon reassignment to non-proteinogenic amino acids.Thus, the one or more sense codons may comprise one or more of TCA, CTA,or TTA. Alternatively, two or more sense codons are removed, wherein thetwo or more sense codons comprise one or more of the sense codon pairs,selected from the group consisting of: GCG and GCA; GCT and GCC; TCG andTCA; AGT and AGC; TCT and TCC; CTG and CTA; TTG and TTA; and CTT andCTC. Preferably, two or more sense codons are removed, wherein the twoor more sense codons comprise one or more of the sense codon pairs,selected from the group consisting of: GCG and GCA; TCG and TCA; AGT andAGC; CTG and CTA; and TTG and TTA. More preferably, the two or moresense codons comprise TCG and TCA.

To achieve removal of sense codons they may be replaced with synonymoussense codons. This is preferable to ensure that the encoded proteinsequence is not changed. For instance, the present invention provides asynthetic prokaryotic genome wherein 90% or more, 95% or more, 98% ormore, 99% or more, 99.5% or more, 99.6% or more, 99.7% or more, 99.8% ormore, 99.9% or more, or 100% of the occurrences of one or more sensecodons in the parent genome are replaced with synonymous sense codons.The person skilled in the art is able to deduce suitable synonymoussense codon replacements. For example, in E. coli, typically TCG, TCA,TCT, TCC, AGT and AGC all encode serine; typically GCG, GCA, GCT and GCCall encode alanine; typically CTG, CTA, CTT, CTC, TTG and TTA all encodeleucine.

In some embodiments, the replacement is a defined replacement, i.e. onesense codon is replaced with a single synonymous sense codon.Preferably, 90% or more, 95% or more, 98% or more, 99% or more, 99.5% ormore, 99.6% or more, 99.7% or more, 99.8% or more, 99.9% or more, or100% of the occurrences of one or more sense codons in the parent genomeare replaced with a defined (i.e. single) synonymous sense codon.

For example, the defined replacement may be: GCG replaced with eitherGCT or GCC; GCA replaced with either GCT or GCC; TCG replaced with anyone of TCT, TCC, AGT, or AGC; TCA replaced with any one of TCT, TCC,AGT, or AGC; AGT replaced with any one of TCG, TCA, TCT, or TCC; AGCreplaced with any one of TCG, TCA, TCT, or TCC; CTG replaced with anyone of CTT, CTC, TTG or TTA; CTA replaced with any one of CTT, CTC, TTGor TTA; TTG replaced with any one of CTG, CTA, CTT or CTC; or TTAreplaced with any one of CTG, CTA, CTT or CTC. Preferably the one ormore defined sense codon replacements are selected from one or more of:GCG to either GCT or GCC; GCA to either GCT or GCC; TCG to either AGT orAGC; TCA to either AGT or AGC; AGT to either TCA or TCT; AGC to eitherTCG or TCC or TCA; TTG to CTT; and TTA to CTC. More preferably, TCGand/or TCA are replaced with AGC and/or AGT. Most preferably, TCG isreplaced with AGC and/or TCA is replaced with AGT.

Preferably, the defined replacement is such that the genome iscompatible with codon reassignment to non-proteinogenic amino acids. Forexample: (i) GCG may be replaced with either GCT or GCC, and GCA may bereplaced with either GCT or GCC; (ii) TCG may be replaced with any ofTCT, TCC, AGT, or AGC, and TCA may be replaced with any of TCT, TCC,AGT, or AGC; (iii) AGT may be replaced with any of TCG, TCA, TCT, orTCC, and AGC may be replaced with any of TCG, TCA, TCT, or TCC; (iv) CTGmay be replaced with any of CTT, CTC, TTG or TTA, and CTA may bereplaced with any of CTT, CTC, TTG or TTA; or (v) TTG may be replacedwith any of CTG, CTA, CTT or CTC, and TTA may be replaced with any ofCTG, CTA, CTT or CTC.

Preferably, the defined replacement scheme is one or more of thoselisted in the table below:

Codon 1 Codon 2 From To From To GCG GCT GCA GCT GCG GCT GCA GCC GCG GCCGCA GCT GCG GCC GCA GCC TCG TCT TCA TCT TCG TCT TCA TCC TCG TCT TCA AGTTCG TCT TCA AGC TCG TCC TCA TCT TCG TCC TCA TCC TCG TCC TCA AGT TCG TCCTCA AGC TCG AGT TCA TCT TCG AGT TCA TCC TCG AGT TCA AGT TCG AGT TCA AGCTCG AGC TCA TCT TCG AGC TCA TCC TCG AGC TCA AGT TCG AGC TCA AGC AGT TCGAGC TCG AGT TCG AGC TCA AGT TCG AGC TCT AGT TCG AGC TCC AGT TCA AGC TCGAGT TCA AGC TCA AGT TCA AGC TCT AGT TCA AGC TCC AGT TCT AGC TCG AGT TCTAGC TCA AGT TCT AGC TCT AGT TCT AGC TCC AGT TCC AGC TCG AGT TCC AGC TCAAGT TCC AGC TCT AGT TCC AGC TCC CTG CTT CTA CTT CTG CTT CTA CTC CTG CTTCTA TTG CTG CTT CTA TTA CTG CTC CTA CTT CTG CTC CTA CTC CTG CTC CTA TTGCTG CTC CTA TTA CTG TTG CTA CTT CTG TTG CTA CTC CTG TTG CTA TTG CTG TTGCTA TTA CTG TTA CTA CTT CTG TTA CTA CTC CTG TTA CTA TTG CTG TTA CTA TTATTG CTG TTA CTG TTG CTG TTA CTA TTG CTG TTA CTT TTG CTG TTA CTC TTG CTATTA CTG TTG CTA TTA CTA TTG CTA TTA CTT TTG CTA TTA CTC TTG CTT TTA CTGTTG CTT TTA CTA TTG CTT TTA CTT TTG CTT TTA CTC TTG CTC TTA CTG TTG CTCTTA CTA TTG CTC TTA CTT TTG CTC TTA CTC GCT GCG GCC GCG GCT GCG GCC GCAGCT GCA GCC GCG GCT GCA GCC GCA TCA TCG TCA TCT TCA TCC TCA AGT TCA AGCTCT TCG TCC TCG TCT TCG TCC TCA TCT TCG TCC AGT TCT TCG TCC AGC TCT TCATCC TCG TCT TCA TCC TCA TCT TCA TCC AGT TCT TCA TCC AGC TCT AGT TCC TCGTCT AGT TCC TCA TCT AGT TCC AGT TCT AGT TCC AGC TCT AGC TCC TCG TCT AGCTCC TCA TCT AGC TCC AGT TCT AGC TCC AGC CTA CTG CTA CTT CTA CTC CTA TTGCTA TTA TTA CTG TTA CTA TTA CTT TTA CTC TTA TTG CTT CTG CTC CTG CTT CTGCTC CTA CTT CTG CTC TTG CTT CTG CTC TTA CTT CTA CTC CTG CTT CTA CTC CTACTT CTA CTC TTG CTT CTA CTC TTA CTT TTG CTC CTG CTT TTG CTC CTA CTT TTGCTC TTG CTT TTG CTC TTA CTT TTA CTC CTG CTT TTA CTC CTA CTT TTA CTC TTGCTT TTA CTC TTA

Preferably, none of these codon replacements affect ribosomal bindingsites (AGGAGG), which are highly conserved regulatory sequences in E.coli. The selected codon replacements may be tested on a small testregion (e.g. a 20 kb region of the genome rich in both essential targetgenes and target codons) to assess viability. If the codon replacementsare not viable on the small test region they may be disregarded.

When replacement of one or more sense codons in the parent genome withdefined replacement synonymous sense codons does not result in a viablegenome, alternative replacement synonymous sense codons may be used. Forinstance, 99.9% of the occurrences of one or more sense codons in theparent genome may be replaced with a defined (i.e. single) synonymoussense codon, and the remaining 0.1% with alternative synonymous sensecodons. For example, 99.9% of the occurrences of TCG may be replacedwith AGC and 0.1% replaced with TCT, TCC, AGT or AGC; and/or 99.9% ofthe occurrences of TCA may be replaced with AGT and 0.1% replaced withTCT, TCC, AGT or AGC.

As used herein, a “stop codon” is a nucleotide triplet that codes fortermination of translation into proteins. Typically, genomes naturallycomprise 3 stop codons: TAA (“ochre”), TGA (“opal” or “umber”) and TAG(“amber”).

In some embodiments the synthetic prokaryotic genome further comprises10 or fewer, 5 or fewer, or no occurrences of one or two stop codons,preferably 10 or fewer, 5 or fewer, or no occurrences of the amber stopcodon (TAG). Preferably wherein 90% or more, 95% or more, 98% or more,99% or more, or all of the occurrences of TAG in the parent prokaryoticgenome are replaced with TAA (the ochre stop codon). In preferredembodiments the synthetic prokaryotic genome comprises no occurrences ofthe amber stop codon (TAG), optionally wherein all of the occurrences ofTAG in the parent prokaryotic genome are replaced with TAA (the ochrestop codon).

Accordingly, in preferred embodiments the synthetic prokaryotic genomeof the present invention comprises no occurrences of one or more, or twoor more sense codons and no occurrences of one stop codon, preferablythe amber stop codon (TAG). In more preferred embodiments the syntheticprokaryotic genome of the present invention comprises no occurrences oftwo sense codons, preferably TCG and TCA, and no occurrences of theamber stop codon (TAG), optionally wherein TCG, TCA and TAG in theparent prokaryotic genome are replaced with synonymous codons, forexample 99.9% or more of the occurrences of TCG in the parentprokaryotic genome are replaced with AGC, 99.9% or more of theoccurrences of TCA in the parent prokaryotic genome are replaced withAGT and all of the occurrences of TAG in the parent prokaryotic genomeare replaced with TAA.

In some embodiments the synthetic prokaryotic genome comprises apolynucleotide sequence which is at least 80%, 85%, 90%, 95%, 98%, 99%,99.5%, 99.8%, or 99.9% identical to SEQ ID NO:1 or SEQ ID NO:2.

The invention provides a synthetic prokaryotic genome which is at least98%, 98.5%, 99%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, 99.95% or 100%identical to SEQ ID NO:1 or SEQ ID NO:2 Sequence comparisons can beconducted by eye, or more usually, with the aid of readily availablesequence comparison programs. These publicly and commercially availablecomputer programs can calculate sequence identity between two or moresequences.

Sequence identity may be calculated over contiguous sequences, i.e. onesequence is aligned with the other sequence and each amino acid in onesequence directly compared with the corresponding amino acid in theother sequence, one residue at a time. This is called an “ungapped”alignment. Typically, such ungapped alignments are performed only over arelatively short number of residues (for example less than 50 contiguousamino acids).

Although this is a very simple and consistent method, it fails to takeinto consideration that, for example, in an otherwise identical pair ofsequences, one insertion or deletion will cause the following amino acidresidues to be put out of alignment, thus potentially resulting in alarge reduction in % homology when a global alignment is performed.Consequently, most sequence comparison methods are designed to produceoptimal alignments that take into consideration possible insertions anddeletions without penalising unduly the overall homology score. This isachieved by inserting “gaps” in the sequence alignment to try tomaximise local homology.

However, these more complex methods assign “gap penalties” to each gapthat occurs in the alignment so that, for the same number of identicalamino acids, a sequence alignment with as few gaps as possible(reflecting higher relatedness between the two compared sequences) willachieve a higher score than one with many gaps. “Affine gap costs” aretypically used that charge a relatively high cost for the existence of agap and a smaller penalty for each subsequent residue in the gap. Thisis the most commonly used gap scoring system. High gap penalties will ofcourse produce optimised alignments with fewer gaps. Most alignmentprograms allow the gap penalties to be modified. However, it ispreferred to use the default values when using such software forsequence comparisons. For example when using the GCG Wisconsin Bestfitpackage (see below) the default gap penalty for amino acid sequences is−12 for a gap and −4 for each extension.

Calculation of maximum % sequence identity therefore firstly requiresthe production of an optimal alignment, taking into consideration gappenalties. A suitable computer program for carrying out such analignment is the GCG Wisconsin Bestfit package (University of Wisconsin,U.S.A; Devereux et al., 1984, Nucleic Acids Research 12:387). Examplesof other software than can perform sequence comparisons include, but arenot limited to, the BLAST package (see Ausubel et al., 1999 ibid—Chapter18), FASTA (Atschul et al., 1990, J. Mol. Biol., 403-410) and theGENEWORKS suite of comparison tools. Both BLAST and FASTA are availablefor offline and online searching (see Ausubel et al., 1999 ibid, pages7-58 to 7-60). However it is preferred to use the GCG Bestfit program.

Suitably, the sequence identity may be determined across the entirety ofthe sequence. Suitably, the sequence identity may be determined acrossthe entirety of the candidate sequence being compared to a sequencerecited herein.

Although the final sequence identity can be measured in terms ofidentity, the alignment process itself is typically not based on anall-or-nothing pair comparison. Instead, a scaled similarity scorematrix is generally used that assigns scores to each pairwise comparisonbased on chemical similarity or evolutionary distance. An example ofsuch a matrix commonly used is the BLOSUM62 matrix (the default matrixfor the BLAST suite of programs). GCG Wisconsin programs generally useeither the public default values or a custom symbol comparison table ifsupplied (see user manual for further details). Preferably, the publicdefault values for the GCG package, or in the case of other software thedefault matrix, such as BLOSUM62, are used.

Once the software has produced an optimal alignment, it is possible tocalculate % sequence identity. The software typically does this as partof the sequence comparison and generates a numerical result.

Refactoring

Genomes contain numerous overlapping open reading frames (ORFs), whichcan be classified as 3′, 3′ (between ORFs in opposite orientations) or5′, 3′ (between ORFs in the same orientation). The one or more sensecodons (i.e. those to be replaced) may be found within both classes ofoverlap in the parent genome.

If the replacement of the one or more sense codons of each ORF within anoverlap can be achieved without changing the encoded protein sequence ofeither ORF (i.e. by introducing synonymous codon(s)) then it may not benecessary to edit (e.g. refactor) the parent genome. However, when theencoded protein sequence is changed by the replacement of the one ormore sense codons, (i.e. one or more synonymous sense codons are notintroduced into one or both of the ORFs), then it may be necessary toedit (e.g. refactor) the parent genome.

Thus, in some embodiments one or more pairs of genes which share anoverlapping region comprising the one or more sense codons in the parentgenome are refactored. “Refactored” means that the genes are reorganisedto prevent changes to the encoded protein sequences. Preferably, thepairs of genes are those in which sense codon replacements (e.g. definedsynonymous codon replacements) would change the encoded protein sequenceof both or either of the pair of genes. Most preferably, all pairs ofgenes which share an overlapping region comprising the one or more sensecodons in the parent genome are refactored, wherein the pairs of genesare those in which sense codon replacements (e.g. defined synonymouscodon replacements) would change the encoded protein sequence of both oreither of the pair of genes.

For 3′,3′ overlaps (i.e. pairs of genes in opposite orientations) asynthetic insert may be inserted between the genes. For 3′,3′ overlapsthe synthetic insert may comprise the overlapping region.

For 5′, 3′ overlaps (i.e. pairs of genes in the same orientation,comprising an upstream gene and a downstream gene) a synthetic insertmay be inserted between the genes. For 5′,3′ overlaps the syntheticinsert may comprise: (i) a stop codon; (ii) about 20-200 bp, or 20-100bp, or 20-50 bp, from upstream of the overlapping region; and (iii) theoverlapping region. Preferably, the synthetic insert comprises: (i) astop codon; (ii) about 20 bp from upstream of the overlapping region;and (iii) the overlapping region. This preserves the sequence of the RBSfor the downstream ORF and the distance between this RBS and its startcodon.

In preferred embodiments the stop codon is in frame with the originalstart site for the downstream gene. Preferably the stop codon is TAA.

Aside from the specific mutations described above, i.e. those aimed atreducing the amount of one or more sense codons (e.g. replacements ofone or more sense codons and/or refactoring) and those aimed at reducingthe amount of amber stop codons, the synthetic prokaryotic genome maycomprise 1000 or fewer, 100 or fewer, 50 or fewer, 20 or fewer, 10 orfewer additional (i.e. non-programmed) mutations relative to the parentgenome. Preferably the synthetic prokaryotic genome comprises 2×10⁻⁴ orfewer additional or non-programmed mutations per target codon (i.e. peroccurrence of the one or more sense codons in the parent genome).

Polynucleotides

The invention provides polynucleotides comprising one or more genes withno occurrences of one or more sense codons. The polynucleotides maycomprise two or more, three or more, four or more, five or more, ten ormore, twenty or more, thirty or more, forty or more, fifty or more, 100or more, 200 or more, 500 or more, 600 or more, 700 or more, 800 ormore, 900 or more, 1000 or more, 1500 or more, or 2000 or more geneswith no occurrences of one or more sense codons. Preferably, thepolynucleotides comprise 100 or more genes with no occurrences of one ormore sense codons. More preferably, the polynucleotides comprise 1000 ormore genes with no occurrences of one or more sense codons.

The one or more sense codons may consist of one, two, three, four, five,six, seven, or eight sense codons. Preferably, the one or more sensecodons consist of one sense codon or two sense codons, most preferablytwo sense codons. Thus, in preferred embodiments the polynucleotidescomprise 100 or more genes with no occurrences of one or two sensecodons. In other preferred embodiments the polynucleotides comprise 1000or more genes with no occurrences of one or two sense codons.

The one or more sense codons may be selected from: TCG, TCA, AGT, AGC,GCG, GCA, GTG, GTA, CTG, CTA, TTG, TTA, ACG, ACA, CCG, CCA, CGG, CGA,CGT, CGC, AGG, AGA, GGG, GGA, GGT, GGC, ATT, and ATC. Alternatively, theone or more sense codons may be selected from: TCG, TCA, TCT, TCC, AGT,AGC, GCG, GCA, GCT, GCC, CTG, CTA, CTT, CTC, TTG, and TTA. Preferably,the one or more sense codons are selected from: TCG, TCA, AGT, AGC, GCG,GCA, CTG, CTA, TTG, and TTA. More preferably, the one of more sensecodons are selected from TCG, TCA, TTG, TTA, GCG and GCA. Mostpreferably, the one of more sense codons are TCG and/or TCA.

The one or more sense codons in the genes may be replaced withsynonymous sense codons. Preferably, the replacement is a definedreplacement, i.e. one sense codon is replaced with a single synonymoussense codon.

For example GCG may be replaced with GCT or GCC; GCA may be replacedwith GCT or GCC; TCG may be replaced with TCT, TCC, AGT, or AGC; TCA maybe replaced with TCT, TCC, AGT, or AGC; AGT may be replaced with TCG,TCA, TCT, or TCC; AGC may be replaced with TCG, TCA, TCT, or TCC; CTGmay be replaced with CTT, CTC, TTG or TTA; CTA may be replaced with CTT,CTC, TTG or TTA; TTG may be replaced with CTG, CTA, CTT or CTC; or TTAmay be replaced with CTG, CTA, CTT or CTC. Preferably the one or moredefined sense codon replacements are selected from: GCG to GCT or GCC;GCA to GCT or GCC; TCG to AGT or AGC; TCA to AGT or AGC; AGT to TCA orTCT; AGC to TCG or TCC or TCA; TTG to CTT; and TTA to CTC. Morepreferably, TCG and/or TCA are replaced with AGC and/or AGT. Mostpreferably, TCG are replaced with AGC and/or TCA are replaced with AGT.

In some embodiments the genes are those for which there is evidence oftranslation and/or of the predicted protein product.

In preferred embodiments the genes are essential genes. The essentialgenes may be selected from one ore more of the list consisting of: ribF,lspA, ispH, dapB, folA, imp, yabQ, ftsL, ftsI, murE, murF, mraY, murD,ftsW, murG, murC, ftsQ, ftsA, ftsZ, lpxC, secM, secA, can, folK, hemL,yadR, dapD, map, rpsB, tsf, pyrH, frr, dxr, ispU, cdsA, yaeL, yaeT,lpxD, fabZ, lpxA, lpxB, dnaE, accA, tilS, proS, yafF, hemB, secD, secF,ribD, ribE, thiL, dxs, ispA, dnaX, adk, hemH, lpxH, cysS, folD, entD,mrdB, mrdA, nadD, holA, rlpB, leuS, lnt, ginS, fldA, cydA, infA, cydC,ftsK, lolA, serS, rpsA, msbA, lpxK, kdsB, mukF, mukE, mukB, asnS, fabA,mviN, rne, fabD, fabG, acpP, tmk, holB, lolC, lolD, lolE, purB, minE,minD, pth, prsA, ispE, lolB, hemA, prfA, prmC, kdsA, topA, ribA, fabI,tyrS, ribC, ydiL, pheT, pheS, rplT, infC, thrS, nadE, gapA, yeaZ, aspS,argS, pgsA, yefM, metG, folE, yejM, gyrA, nrdA, nrdB, folC, accD, fabB,gltX, ligA, zipA, dapE, dapA, der, hisS, ispG, suhB, tadA, acpS, era,rnc, IepB, rpoE, pssA, yfiO, rplS, trmD, rpsP, ffh, grpE, csrA, ispF,ispD, ftsB, eno, pyrG, chpR, lgt, fbaA, pgk, yqgD, metK, yqgF, plsC,ygiT, parE, ribB, cca, ygjD, tdcF, yraL, yhbV, infB, nusA, ftsH, obgE,rpmA, rplU, ispB, murA, yrbB, yrbK, yhbN, rpsI, rplM, degS, mreD, mreC,mreB, accB, accC, yrdC, def, fmt, rplQ, rpoA, rpsD, rpsK, rpsM, secY,rplO, rpmD, rpsE, rplR, rplF, rpsH, rpsN, rplE, rplX, rplN, rpsQ, rpmC,rplP, rpsC, rplV, rpsS, rplB, rplW, rplD, rplC, rpsJ, fusA, rpsG, rpsL,trpS, yrfF, asd, rpoH, ftsX, ftsE, ftsY, yhhQ, bcsB, glyQ, gpsA, rfaK,kdtA, coaD, rpmB, dfp, dut, gmk, spoT, gyrB, dnaN, dnaA, rpmH, rnpA,yidC, tnaB, glmS, glmU, wzyE, hemD, hemC, yigP, ubiB, ubiD, hemG, yihA,ftsN, murI, murB, birA, secE, nusG, rplJ, rplL, rpoB, rpoC, ubiA, plsB,lexA, dnaB, ssb, alsK, groS, psd, orn, yjeE, rpsR, chpS, ppa, valS,yjgP, yjgQ, and dnaC.

Preferably, the essential genes may be selected from one ore more of thelist consisting of: ribF, lspA, ispH, dapB, folA, imp, yabQ, lpxC, secM,secA, can, folK, hemL, yadR, dapD, map, rpsB, tsf, pyrH, frr, dxr, ispU,cdsA, yaeL, yaeT, lpxD, fabZ, lpxA, lpxB, dnaE, accA, tilS, proS, yafF,hemB, secD, secF, ribD, ribE, thiL, dxs, ispA, dnaX, adk, hemH, lpxH,cysS, folD, entD, mrdB, mrdA, nadD, holA, rlpB, leuS, lnt, ginS, fldA,cydA, infA, cydC, ftsK, lolA, serS, rpsA, msbA, lpxK, kdsB, mukF, mukE,mukB, asnS, fabA, mviN, me, fabD, fabG, acpP, tmk, holB, lolC, lolD,lolE, purB, minE, minD, pth, prsA, ispE, lolB, hemA, prfA, prmC, kdsA,topA, ribA, fabI, tyrS, ribC, ydiL, pheT, pheS, rplT, infC, thrS, nadE,gapA, yeaZ, aspS, argS, pgsA, yefM, metG, folE, yejM, gyrA, nrdA, nrdB,folC, accD, fabB, gltX, ligA, zipA, dapE, dapA, der, hisS, ispG, suhB,tadA, acpS, era, rnc, lepB, rpoE, pssA, yfiO, rplS, trmD, rpsP, ffh,grpE, csrA, ispF, ispD, ftsB, eno, pyrG, chpR, lgt, fbaA, pgk, yqgD,metK, yqgF, plsC, ygiT, parE, ribB, cca, ygjD, tdcF, yraL, yhbV, infB,nusA, ftsH, obgE, rpmA, rplU, ispB, murA, yrbB, yrbK, yhbN, rpsI, rplM,degS, mreD, mreC, mreB, accB, accC, yrdC, def, fmt, rplQ, rpoA, rpsD,rpsK, rpsM, secY, rplO, rpmD, rpsE, rplR, rplF, rpsH, rpsN, rplE, rplX,rplN, rpsQ, rpmC, rplP, rpsC, rplV, rpsS, rplB, rplW, rplD, rplC, rpsJ,fusA, rpsG, rpsL, trpS, yrfF, asd, rpoH, ftsX, ftsE, ftsY, yhhQ, bcsB,glyQ, gpsA, rfaK, kdtA, coaD, rpmB, dfp, dut, gmk, spoT, gyrB, dnaN,dnaA, rpmH, rnpA, yidC, tnaB, glmS, glmU, wzyE, hemD, hemC, yigP, ubiB,ubiD, hemG, yihA, ftsN, murI, murB, birA, secE, nusG, rplJ, rplL, rpoB,rpoC, ubiA, plsB, lexA, dnaB, ssb, alsK, groS, psd, orn, yjeE, rpsR,chpS, ppa, valS, yjgP, yjgQ, and dnaC.

Accordingly, the invention provides polynucleotides comprising one ormore essential genes with no TCG codons and/or TCA codons, wherein theone or more essential genes is selected from the list consisting of:ribF, lspA, ispH, dapB, folA, imp, yabQ, lpxC, secM, secA, can, folK,hemL, yadR, dapD, map, rpsB, tsf, pyrH, frr, dxr, ispU, cdsA, yaeL,yaeT, lpxD, fabZ, lpxA, lpxB, dnaE, accA, tilS, proS, yafF, hemB, secD,secF, ribD, ribE, thiL, dxs, ispA, dnaX, adk, hemH, lpxH, cysS, folD,entD, mrdB, mrdA, nadD, holA, rlpB, leuS, lnt, glnS, fldA, cydA, infA,cydC, ftsK, lolA, serS, rpsA, msbA, lpxK, kdsB, mukF, mukE, mukB, asnS,fabA, mviN, me, fabD, fabG, acpP, tmk, holB, lolC, lolD, lolE, purB,minE, minD, pth, prsA, ispE, lolB, hemA, prfA, prmC, kdsA, topA, ribA,fabI, tyrS, ribC, ydiL, pheT, pheS, rplT, infC, thrS, nadE, gapA, yeaZ,aspS, argS, pgsA, yefM, metG, folE, yejM, gyrA, nrdA, nrdB, folC, accD,fabB, gltX, ligA, zipA, dapE, dapA, der, hisS, ispG, suhB, tadA, acpS,era, rnc, lepB, rpoE, pssA, yfiO, rplS, trmD, rpsP, ffh, grpE, csrA,ispF, ispD, ftsB, eno, pyrG, chpR, lgt, fbaA, pgk, yqgD, metK, yqgF,plsC, ygiT, parE, ribB, cca, ygjD, tdcF, yraL, yhbV, infB, nusA, ftsH,obgE, rpmA, rplU, ispB, murA, yrbB, yrbK, yhbN, rpsI, rplM, degS, mreD,mreC, mreB, accB, accC, yrdC, def, fmt, rplQ, rpoA, rpsD, rpsK, rpsM,secY, rplO, rpmD, rpsE, rplR, rplF, rpsH, rpsN, rplE, rplX, rplN, rpsQ,rpmC, rplP, rpsC, rplV, rpsS, rplB, rplW, rplD, rplC, rpsJ, fusA, rpsG,rpsL, trpS, yrfF, asd, rpoH, ftsX, ftsE, ftsY, yhhQ, bcsB, glyQ, gpsA,rfaK, kdtA, coaD, rpmB, dfp, dut, gmk, spoT, gyrB, dnaN, dnaA, rpmH,rnpA, yidC, tnaB, glmS, glmU, wzyE, hemD, hemC, yigP, ubiB, ubiD, hemG,yihA, ftsN, murI, murB, birA, secE, nusG, rplJ, rplL, rpoB, rpoC, ubiA,plsB, lexA, dnaB, ssb, alsK, groS, psd, orn, yjeE, rpsR, chpS, ppa,valS, yjgP, yjgQ, and dnaC. Preferably, the polynucleotides comprise twoor more, three or more, four or more, five or more, ten or more, twentyor more, thirty or more, forty or more, fifty or more, 100 or more, or200 or more essential genes with no TCG codons and/or TCA codons.

In some embodiments the polynucleotide comprises a polynucleotidesequence which is at least 80%, 85%, 90%, 95%, 98%, 99%, 99.5%, 99.8%,or 99.9%, or 100% identical to SEQ ID NO:1 or SEQ ID NO:2 or to anyfragment of SEQ ID NO:1 or SEQ ID NO:2, preferably wherein the fragmentis at least 10 kb, 20 kb, 50 kb, 100 kb, or 500 kb in length.

Preferably the polynucleotide is viable. I.e. the polynucleotide mayincorporated into a genome such that the genome is a viable genome.Preferably, the polynucleotide may replace a corresponding region of theparent genome and retain viability of said genome. As used herein, a“viable genome” refers to a genome that contains nucleic acid sequencessufficient to cause and/or sustain viability of a cell, e.g., thoseencoding molecules required for replication, transcription, translation,energy production, transport, production of membranes and cytoplasmiccomponents, and cell division. Thus, the present invention also providesa viable synthetic prokaryotic genome (e.g. a viable synthetic E. coligenome) comprising the polynucleotide of the present invention.

The invention provides a polynucleotide which is at least 98%, 98.5%,99%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9%, 99.95% or 100% identical to SEQID NO:1 or SEQ ID NO:2 or to any fragment of SEQ ID NO:1 or SEQ ID NO:2,preferably wherein the fragment is at least 10 kb, 20 kb, 50 kb, 100 kb,or 500 kb in length.

Host Cells and Uses Thereof

Host Cells

The invention also provides a host cell comprising the syntheticprokaryotic genome or the polynucleotide of the invention. The host cellmay be an isolated host cell.

The host cell of the present invention is a prokaryotic cell. Morepreferably, the host cell is a bacterial cell. Preferably the bacterialhost cell is suitable for heterologous protein production, in particularthe production of polypeptides comprising one or more non-proteinogenicamino acids (for instance those described by Ferrer-Miralles, N. andVillaverde, A., 2013. Microbial Cell Factories, 12:113). Suitablebacterial host cells include: escherichia (e.g. Escherichia coli),caulobacteria (e.g. Caulobacter crescentus), phototrophic bacteria (e.g.Rodhobacter sphaeroides), cold adapted bacteria (e.g. Pseudoalteromonashaloplanktis, Shewanella sp. strain Ac10), pseudomonads (e.g.Pseudomonas fluorescens, Pseudomonas putida, Pseudomonas aeruginosa),halophilic bacteria (e.g. Halomonas elongate, Chromohalobactersalexigens), streptomycetes (e.g. Streptomyces lividans, Streptomycesgriseus), nocardia (e.g. Nocardia lactamdurans), mycobacteria (e.g.Mycobacterium smegmatis), coryneform bacteria (e.g. Corynebacteriumglutamicum, Corynebacterium ammoniagenes, Brevibacteriumlactofermentum), bacilli (e.g. Bacillus subtilis, Bacillus brevis,Bacillus megaterium, Bacillus licheniformis, Bacillusamyloliquefaciens), and lactic acid bacteria (e.g. Lactococcus lactis,Lactobacillus plantarum, Lactobacillus casei, Lactobacillus reuteri,Lactobacillus gasseri). In some embodiments the bacterial host cell isgram-negative bacterium.

Preferably, the host cell is an Escherichia coli, Salmonella enterica,or Shigella dysenteriae.

More preferably, the host cell is an E. coli. Suitable E. coli hostcells include MDS42, K-12, MG1655, BL21, BL21(DE3), AD494, Origami,HMS174, BLR(DE3), HMS174(DE3), Tuner(DE3), Origami2(DE3), Rosetta2(DE3),Lemo21(DE3), NiCo21(DE3), T7 Express, SHuffle Express, C41(DE3),C43(DE3), and m15 pREP4 or derivatives thereof (Rosano, G. L. andCeccarelli, E. A., 2014. Frontiers in microbiology, 5, p. 172). Mostpreferably, the host cell is MDS42, MG1655, or BL21 or a derivativethereof. MG1655 is considered as the wild type strain of E. coli. TheGenBank ID of genomic sequence of this strain is U00096. BL21 is widelyavailable commercially. For example, it can be purchased from NewEngland BioLabs with catalog number C2530H.

The host cell may preferably be the same as that from which thesynthetic prokaryotic genome or polynucleotide is from (or derivedfrom). For example, if the synthetic prokaryotic genome is a syntheticE. coli genome then the host cell is preferably an E. coli. When theparent genome of a cell has been modified to produce the syntheticprokaryotic genome of the present invention, the host cell is preferablythe same cell, i.e. preferably the host cell comprising the syntheticprokaryotic genome is the same as the host cell of the parent genome(the parent host cell).

The host cell may be viable, i.e. able to grow and replicate.

When the genome of a cell has been modified to produce the syntheticprokaryotic genome of the present invention, the synthetic prokaryoticgenome is preferably one which, when present in the parent host cell,does not substantially decrease the growth rate. Thus, preferably thehost cell comprising the synthetic prokaryotic genome does not have asubstantially decreased growth rate relative to the host cell comprisingthe parent genome. In some embodiments the host cell comprising thesynthetic prokaryotic genome has a doubling time less than 4 times, 3times, 2 times, or about 1.6 times, slower than the host cell comprisingthe host cell comprising the parent genome. The doubling time can bedetermined by any method known to those of skill in the art. In someembodiments the doubling time is determined at 37° C., 25° C. or 42° C.,in LB media.

When the genome of a cell has been modified to produce the syntheticprokaryotic genome of the present invention, the synthetic prokaryoticgenome is preferably one which, when present in the parent host cell,does not cause any substantial phenotypical changes. Thus, preferablythe host cell comprising the synthetic prokaryotic genome does not haveany substantial phenotypical changes relative to the host cellcomprising the parent genome. In some embodiments the host cellcomprising the synthetic prokaryotic genome has a mean cell length lessthan 100%, 50%, or about 20% greater than the host cell comprising theparent genome. For example, the cell length may be about 1.5 to 3microns. The cell length can be determined by any method known to thoseof skill in the art. In some embodiments the host cell comprising thesynthetic prokaryotic genome has a proteome that is not substantiallydifferent from the proteome of the host cell comprising the parentgenome. The proteome can be determined by any method known to those ofskill in the art.

Reassignment to Alternative Canonical Amino Acids

In some embodiments the one or more sense codons (i.e. those removedfrom the parent genome) are reassigned to encode alternative canonicalamino acids. For example, if TCG and TCA have been removed, one or bothmay be reassigned to encode a canonical amino acid other than serine(e.g. alanine).

For instance, the synthetic prokaryotic genome of the present inventionsubstantially or completely lacks one or more sense codons. Therefore,one or more tRNA or release factors may be deleted from the syntheticgenome. For instance, a tRNA which decodes the one or more sense codonsthat have been replaced (or deleted) may be deleted from the syntheticprokaryotic genome. A tRNA which decodes one or more sense codons thathave been replaced (or deleted) may be deleted and the syntheticprokaryotic genome will remain viable if the tRNA decodes only the oneor more sense codons that have been replaced (or deleted); oralternatively if the tRNA decodes one or more sense codons that havebeen replaced (or deleted) and one or more sense codons that have notbeen replaced (or deleted), if the tRNA is dispensable for the one ormore sense codons that have not been replaced (or deleted) (i.e. the oneor remaining sense codons which the tRNA decodes are decoded by one ormore alternative tRNAs). For example, if the synthetic prokaryoticgenome lacks TCA sense codons, serT, encoding tRNA^(Ser) _(UGA), may bedeleted and/or if the synthetic prokaryotic genome lacks TCG sensecodons, serU, encoding tRNA^(Ser) _(CGA), may be deleted. The deletionof one or more tRNAs may be used, for instance, in combination with areassigned, endogenous tRNA or an orthogonal aminoacyl-tRNAsynthetase/tRNA pair to reassign the one or more sense codons to analternative amino acid.

For example, if TCG and TCA have been removed from the syntheticprokaryotic genome, serT, encoding tRNA^(Ser) _(UGA), and serU, encodingtRNA^(Ser) _(CGA), may be deleted from the synthetic prokaryotic genome,and either the tRNA_(CGA) can be reassigned (e.g. to tRNA^(Ala) _(CGA))an orthogonal aminoacyl-tRNA synthetase/tRNA_(CGA) pair may beintroduced to the host cell (e.g. by a heterologous nucleic acid or byincorporation into the synthetic prokaryotic genome) to reassign TCG toan alternative canonical amino acid. Thus, in some embodiments, the hostcell of the present invention further comprises one or more reassignedtRNAs and/or one or more heterologous nucleotides (e.g. plasmids)encoding one orthogonal aminoacyl-tRNA synthetase (aaRS)-tRNA pair. Insome embodiments the host cell of the present invention furthercomprises a plasmid encoding an orthogonal aminoacyl-tRNA synthetase(aaRS)-tRNA pair. Alternatively, the orthogonal aminoacyl-tRNAsynthetase (aaRS)-tRNA pair may be introduced into the host cell byincorporation into the synthetic prokaryotic genome. Thus, in someembodiments the synthetic prokaryotic genome encodes an orthogonalaminoacyl-tRNA synthetase (aaRS)-tRNA pair, preferably wherein the geneencoding the native tRNA has been deleted from the parent prokaryoticgenome. In preferred embodiments the host cell of the present inventionfurther comprises one or more reassigned tRNAs. Methods for reassigningtRNAs will be well known to those of skill in the art.

The reassignment to encode alternative canonical amino acids mayincrease biosafety. Thus, in some embodiments the host cell of thepresent invention has increased biosafety. Accordingly, the presentinvention provides host cells with improved biosafety.

For example, the reassignment to encode alternative canonical aminoacids may render the host cell comprising the synthetic prokaryoticgenome resistant to bacteriophage infection. One or more bacteriophagegenes will typically comprise the one or more sense codons, thus whenthe one or more bacteriophage genes are translated an alternativecanonical amino acid may be incorporated into the correspondingbacteriophage proteins. The incorporation of an alternative canonicalamino acid may destabilise, disrupt or reduce the activity of saidproteins, thus reducing the infectivity of the bacteriophage andrendering the host cell resistant to bacteriophage infection.

Thus, in some embodiments the host cell of the present invention isresistant to phage infection. For example, when the genome of a cell hasbeen modified to produce the synthetic prokaryotic genome of the presentinvention, the synthetic prokaryotic genome may be one which, whenpresent in the parent host cell, increases resistance to phageinfection. Thus, in some embodiments the host cell comprising thesynthetic prokaryotic genome has increased phage resistance relative tothe host cell comprising the parent genome.

Accordingly, the present invention provides phage-resistant host cellsand host cells with increased phage resistance.

The reassignment to encode alternative canonical amino acids may alsoallow genetic material, e.g. antibiotic resistance genes, to be designedsuch that they are functional in the recoded strain, but not in wildtype strains. For example, the genetic material may be incorporated intothe host cell of the present invention (e.g. by a heterologous nucleicacid or by incorporation into the synthetic prokaryotic genome) suchthat the host cell will grow in certain conditions (e.g. in the presenceof an antibiotic), but other host cells (e.g. the parent host cell) willnot. Thus, in some embodiments the host cell of the present inventionmay render a composition comprising the host cell more resistant tocontamination by other host cells (e.g. other prokaryotes).

Reassignment to Non-Proteinogenic Amino Acids

In some embodiments the one or more sense codons (i.e. those removedfrom the parent genome) are reassigned to encode non-canonical aminoacids (non-proteinogenic amino acids).

Thus, the present invention provides for use of a host cell according tothe present invention for producing polypeptides comprising one or morenon-proteinogenic amino acids, preferably two or more non-proteinogenicamino acids, most preferably three or more non-proteinogenic aminoacids.

The present invention also provides polypeptides obtained or obtainableby using a host cell according to the present invention. In someembodiments, the polypeptides comprise one or more non-proteinogenicamino acids, preferably two or more non-proteinogenic amino acids, mostpreferably three or more non-proteinogenic amino acids. Thus, thepresent invention also provides polypeptides comprising two or morenon-proteinogenic amino acids and polypeptides comprising three or morenon-proteinogenic amino acids.

As used herein, “non-proteinogenic amino acids” (also known as“non-coded amino acids” or “noncanonical amino acids”) are amino acidsthat are not naturally encoded or found in the genetic code. Despite theuse of only 22 amino acids by the translational machinery to assembleproteins (the proteinogenic amino acids—20 in the standard genetic codeand an additional 2 that can be incorporated by special translationmechanisms), over 140 amino acids are known to occur naturally inproteins and thousands more may occur in nature or be synthesized in thelaboratory. Thus, non-proteinogenic amino acids may comprise any aminoacid excluding L-alanine, L-cysteine, L-aspartic acid, L-glutamic acid,L-phenylalanine, glycine, L-histidine, L-isoleucine, L-lysine,L-leucine, L-methionine, L-asparagine, L-proline, L-glutamine,L-arginine, L-serine, L-threonine, L-valine, L-tryptophan andL-tyrosine, and optionally L-pyrrolysine and L-selenocysteine.

In some embodiments, the non-proteinogenic amino acids are unnaturalamino acids (UAAs).

The non-proteinogenic amino acid or UAA is not particularly limited.Suitable non-proteinogenic amino acid and UAAs will be well known tothose of skill in the art, for example those disclosed in Neumann, H.,2012. FEBS letters, 586(15), pp. 2057-2064; and Liu, C. C. and Schultz,P. G., 2010. Annual review of biochemistry, 79, pp. 413-444. In someembodiments the non-proteinogenic amino acid and/or UAAs are selectedfrom one or more of: p-Acetylphenylalanine, m-Acetylphenylalanine,O-allyltyrosine, Phenylselenocysteine, p-Propargyloxyphenylalanine,p-Azidophenylalanine, p-Boronophenylalanine, O-methyltyrosine,p-Aminophenylalanine, p-Cyanophenylalanine, m-Cyanophenylalanine,p-Fluorophenylalanine, p-lodophenylalanine, p-Bromophenylalanine,p-Nitrophenylalanine, L-DOPA, 3-Aminotyrosine, 3-lodotyrosine,p-lsopropylphenylalanine, 3-(2-Naphthyl)alanine, Biphenylalanine,Homoglutamine, D-tyrosine, p-Hydroxyphenyllactic acid, 2-Aminocaprylicacid, Bipyridylalanine, HQ-alanine, p-Benzoylphenylalanine,o-Nitrobenzylcysteine, o-Nitrobenzylserine,4,5-Dimethoxy-2-nitrobenzylserine, o-Nitrobenzyllysine,o-Nitrobenzyltyrosine, 2-Nitrophenylalanine, Dansylalanine,p-Carboxymethylphenylalanine, 3-Nitrotyrosine, Sulfotyrosine,Acetyllysine, Methylhistidine, 2-Aminononanoic acid, 2-Aminodecanoicacid, Pyrrolysine, Cbz-lysine, Boc-lysine and Allyloxycarbonyllysine.

Prokaryotes, e.g. E. coli, are not typically able to incorporate mosteukaryotic post-translational modifications, such as ubiquitination,glycosylation and phosphorylation, nor are they typically capable ofother eukaryotic maturation processes, and proteolytic proteinmaturation. In addition, correct disulphide bond formation andlipolysaccharide contaminations can be troublesome (see Ovaa, H., 2014.Frontiers in chemistry, 2, p. 15). However, therapeutic proteins, suchas antibodies, enzymes and cytokines commonly carry post-translationalmodifications and disulphide bonds, and often require proteolyticmaturation to attain their correctly folded state. Thus, the majority oftherapeutic proteins are produced in eukaryotic and mammalian cellsystems. However, expression in prokaryotic host cells e.g. E. coli isin general cheaper, more susceptible to genetic modifications, andversatile with regard to mutant library development, and suitable forindustrial scale fermentation (Ovaa, H., 2014. Frontiers in chemistry,2, p. 15).

Thus, in some embodiments the polypeptides are therapeutic polypeptides,preferably wherein mammalian protein modifications have been introducedvia one or more non-proteinogenic amino acids. For example, amber codonsuppression has previously been used to incorporate one or morenon-proteinogenic amino acids (i.e. mammalian protein modifications)into therapeutic polypeptides. The present invention allows two or morenon-proteinogenic amino acids to be incorporated. Thus, the presentinvention provides a therapeutic polypeptide comprising two or morenon-proteinogenic amino acids.

The synthetic prokaryotic genome of the present invention substantiallyor completely lacks one or more sense codons, therefore one or more tRNAor release factors may be deleted from the synthetic genome. Forexample, a tRNA which decodes only the one or more sense codons thathave been replaced (or deleted) may be deleted from the syntheticprokaryotic genome. For example, if the synthetic prokaryotic genomelacks TCA sense codons, serT, encoding tRNA^(Ser) _(UGA), may be deletedand/or if the synthetic prokaryotic genome lacks TCG sense codons, serU,encoding tRNA^(Ser) _(CGA), may be deleted. The synthetic prokaryoticgenome may then be used (in conjunction with an orthogonalaminoacyl-tRNA synthetase-tRNA pair) to direct the incorporation ofnon-proteinogenic amino acids into proteins.

Genetic code expansion uses an orthogonal aminoacyl-tRNA synthetase(aaRS)-tRNA pair to direct the incorporation of non-proteinogenic aminoacids into proteins, in response to an unassigned codon (e.g. the amberstop codon, UAG) introduced at the desired site in a gene of interest.The orthogonal synthetase does not recognize endogenous tRNAs, andspecifically aminoacylates an orthogonal cognate tRNA (which is not anefficient substrate for endogenous synthetases) with thenon-proteinogenic amino acids provided to (or synthesized by) the cell(Chin, J. W., 2017. Nature, 550(7674), 53-60). The person skilled in theart would be able to identify and/or generate suitable orthogonalaminoacyl-tRNA synthetase (aaRS)-tRNA pairs (e.g. Elliott, T. S. et al.,2014. Nat Biotechnol 32, 465-472; Elliott, T. S., et al., 2016. CellChem Biol 23, 805-815; and Krogager, T. P. et al., 2018. Nat Biotechnol36, 156-159). Thus, in some embodiments, the host cell of the presentinvention further comprises one or more heterologous nucleotides (e.g.plasmids) encoding one orthogonal aminoacyl-tRNA synthetase (aaRS)-tRNApair. In preferred embodiments the host cell of the present inventionfurther comprises a plasmid encoding an orthogonal aminoacyl-tRNAsynthetase (aaRS)-tRNA pair. Alternatively, the orthogonalaminoacyl-tRNA synthetase (aaRS)-tRNA pair may be introduced into thehost cell by incorporation into the synthetic prokaryotic genome. Thus,in some embodiments the synthetic prokaryotic genome encodes anorthogonal aminoacyl-tRNA synthetase (aaRS)-tRNA pair, preferablywherein the gene encoding the native tRNA has been deleted from theparent prokaryotic genome.

Thus, in some embodiments the host cell of the present invention furthercomprises one or more heterologous nucleotides (e.g. plasmids) whichcomprise one or more genes comprising said sense codons. In preferredembodiments the host cell further comprises a plasmid comprising a genecomprising said sense codons. The one or more sense codons may bepresent in a desired site in the gene, preferably wherein the desiredsite allows incorporation of one or more non-proteinogenic amino acids(i.e. mammalian protein modifications) into polypeptides, preferablytherapeutic polypeptides.

In other embodiments said sense codons may be present in one or moregenes in the synthetic prokaryotic genome (for example, the heterologousnucleotide may be incorporated into the synthetic prokaryotic genome).The one or more sense codons may be present in a desired site in thegene, preferably wherein the desired site allows incorporation of one ormore non-proteinogenic amino acids (i.e. mammalian proteinmodifications) into polypeptides, preferably therapeutic polypeptides.

For example, if TCG and TCA have been removed from the syntheticprokaryotic genome, serT, encoding tRNA^(Ser) _(UGA), and serU, encodingtRNA^(Ser) _(CGA), may be deleted from the synthetic prokaryotic genome,and an orthogonal aminoacyl-tRNA synthetase/tRNA_(CGA) pair may be usedin combination with (heterologous) genes comprising the TCG codon, toencode polypeptides comprising one or more non-proteinogenic amino acid.Thus, the host cell of the present invention may, for instance, furthercomprise: (i) a plasmid encoding an orthogonal aminoacyl-tRNAsynthetase/tRNA_(CGA) pair; and (ii) a plasmid comprising a genecomprising one or more TCG codons. Similarly, if AGT and AGC areremoved, serV, encoding tRNA^(Ser) _(GCU) may be deleted from thesynthetic prokaryotic genome, and an orthogonal aminoacyl-tRNAsynthetase/tRNA_(ACU) pair and/or an orthogonal aminoacyl-tRNAsynthetase/tRNA_(GCU) pair may be used. Similarly, if CTG and CTA areremoved, leuP,Q,T,V encoding tRNA^(Leu) _(CAG), and leuW, encodingtRNA^(Leu) _(UAG), may be deleted from the synthetic prokaryotic genome,and an orthogonal aminoacyl-tRNA synthetase/tRNA_(CAG) pair may be used.Similarly, if TTG and TTA are removed, leuX, encoding tRNA^(Leu) _(CAA),and leuZ, encoding tRNA^(Leu) _(UAA), may be deleted from the syntheticprokaryotic genome, and an orthogonal aminoacyl-tRNAsynthetase/tRNA_(CAA) pair and/or an orthogonal aminoacyl-tRNAsynthetase/tRNA_(UAA) pair may be used may be used. Similarly, if GCGand GCA are removed, alaT,U,V, encoding tRNA^(Ala) _(UGC) may be deletedfrom the synthetic prokaryotic genome, and an orthogonal aminoacyl-tRNAsynthetase/tRNA_(CGC) pair may be used.

In some embodiments the synthetic prokaryotic genome lacks genesencoding release factors (e.g. RF1) and/or the host cell lacks releasefactors (e.g. RF1) to increase the efficiency of incorporation ofnon-proteinogenic amino acids.

Method for Producing a Synthetic Genome

In one aspect, the invention provides a method for producing a syntheticgenome comprising:

-   -   (a) providing a parent genome;    -   (b) carrying out one or more rounds of recombination-mediated        genetic engineering on the parent genome, to produce two or more        different partially synthetic genomes; and    -   (c) carrying out one or more rounds of directed conjugation with        the two or more different partially synthetic genomes to produce        a synthetic genome.

Recombination-Mediated Genetic Engineering

Preferably one or more rounds of recombination-mediated geneticengineering are used to edit 10-1000 kb, 50-1000 kb, 100-1000 kb, or100-500 kb of the parent genome to provide two or more differentpartially synthetic genomes. Thus, in preferred embodiments each roundof recombination-mediated genetic engineering inserts or replaces 10 kbor more, 50 kb or more, 100 kb or more, or about 100 kb of DNA in theparent genome.

As used herein, the term “recombination-mediated genetic engineering”(also known as “recombineering”) is a method for genetic engineering(i.e. editing genomes) based on homologous recombination systems.Typically recombineering is based on homologous recombination inEscherichia coli mediated by bacteriophage proteins, either RecE/RecTfrom Rac prophage or Redaβδ from bacteriophage lambda. Any suitablemethod of recombination-mediated genetic engineering may be used.Methods for recombination-mediated genetic engineering will be wellknown to those of skill in the art.

In “classical recombination” (exemplified by lambda red mediatedrecombination in E. coli), short regions of synthetic DNA may beinserted into the genome or used to replace genomic DNA in a two-stepprocess: i) transformation of cells with linear double stranded DNA(dsDNA) carrying a stretch of synthetic DNA, coupled with a positiveselection marker, and flanked by a homology region (HR) to the targetregion of the genome on each end, and ii) recombination mediated by thehomologous regions, followed by selection for genomic integration byvirtue of the positive selection marker. This approach can be used toinsert or replace 2-3 kb of genomic DNA. Thus, if classicalrecombination is used, many rounds of recombination-mediated geneticengineering would be required to edit 100-500 kb of the parent genome.

Thus, in preferred embodiments the one or more rounds ofrecombination-mediated genetic engineering comprise one or more roundsof replicon excision for enhanced genome engineering through programmedrecombination (REXER).

REXER is described in WO 2018/020248 (herein incorporated by reference).Each round of REXER may be used to insert or replace about 50 kb to 250kb, or about 100 kb of DNA in the parent genome.

Thus, the one or more rounds of recombination-mediated geneticengineering may comprise:

-   -   i) providing a host cell (e.g. E. coli), wherein the host cell        comprises an episomal replicon (e.g. a plasmid or a bacterial        artificial chromosome) and a target nucleic acid (e.g. the        genome), wherein the episomal replicon comprises a donor nucleic        acid sequence (i.e. a synthetic region), wherein the donor        nucleic acid sequence comprises in order: 5′—homologous        recombination sequence 1—sequence of interest—homologous        recombination sequence 2-3′, wherein the sequence of interest        comprises a positive selectable marker, and wherein the target        nucleic acid comprises in order: 5′—homologous recombination        sequence 1—negative selectable marker—homologous recombination        sequence 2-3′;    -   ii) providing helper protein(s) capable of supporting nucleic        acid recombination in said host cell (e.g. lambda Red proteins);    -   iii) providing helper protein(s) and/or RNAs capable of        supporting nucleic acid excision in said host cell (e.g.        CRISPR/Cas9 proteins/RNAs);    -   iv) inducing excision of said donor nucleic acid sequence;    -   v) incubating to allow recombination between the excised donor        nucleic acid and said target nucleic acid; and    -   vi) selecting for recombinants having incorporated said donor        nucleic acid into said target nucleic acid.

Suitably selecting for recombinants having incorporated said donornucleic acid into said target nucleic acid comprises selection for gainof the positive selectable marker of the donor nucleic acid and loss ofthe negative selectable marker of the target nucleic acid. Suitablyselection for gain of the positive selectable marker of the donornucleic acid and loss of the negative selectable marker of the targetnucleic acid is carried out simultaneously. Suitably said sequence ofinterest comprises both a positive selectable marker and a negativeselectable marker. Suitably the negative selectable marker is selectedfrom the group consisting of sacB (sucrose sensitivity), rpsL (S12ribosomal protein—streptomycin sensitivity), orphe^(ST251A_A294G)(4-chlorophenylalanine sensitivity). Suitably thepositive selectable marker is selected from the group consisting ofCm^(R) (chloramphenicol resistance), Kan^(R) (kanamycin resistance),Hyg^(R) (hygromycin resistance), Gentamycin^(R) (gentamycin resistance),or tetracycline^(R) (tetracycline resistance). Suitably the step ofselecting for recombinants comprises sequential selection for saidpositive and negative markers, or sequential selection for said negativeand positive markers. Suitably the step of selecting for recombinantscomprises simultaneous selection for said positive and negative markers.

Suitably said method as described above further comprises the step ofinducing at least one double stranded break in the target nucleic acidsequence, wherein said double stranded break is between said homologousrecombination sequence 1 and said homologous recombination sequence 2.Suitably at least two double stranded breaks are induced in the targetnucleic acid sequence, wherein each said double stranded break isbetween said homologous recombination sequence 1 and said homologousrecombination sequence 2.

Suitably said excised donor nucleic acid begins with said homologousrecombination sequence 1 and ends with said homologous recombinationsequence 2.

Suitably said episomal replicon comprises a negative selectable markerindependent of the donor nucleic acid sequence. Suitably said methodcomprises the further step of selecting for loss of the episomalreplicon by selecting for loss of said negative selectable markerindependent of the donor nucleic acid sequence. Suitably said episomalreplicon comprises in order: excision cut site 1—donor nucleic acidsequence—excision cut site 2. Suitably said target nucleic acidpossesses its own origin of replication capable of functioning withinsaid host cell. Suitably said episomal replicon is a plasmid nucleicacid. Suitably said episomal replicon is a bacterial artificialchromosome (BAC). Suitably said target nucleic acid is the host cellgenome.

The episomal replicon (e.g. BAC) may be assembled by homologousrecombination, for example in S. cerevisiae, as described in Kouprina,N., et al., 2004. Methods Mol Biol 255, 69-89. The assembly may combine:7-14 stretches of synthetic DNA, each 6-13 kb in length; a selectionconstruct (comprising a negative selection marker and/or a positiveselection marker); and a BAC shuttle vector backbone. The stretches ofsynthetic DNA may collectively correspond to the donor nucleic acidsequence (i.e. the synthetic region) in the episomal replicon, whereineach stretch comprises 80-200 bp of overlapping DNA sequence with eachother, and wherein the overlap regions are free of any recoding targets.The stretches may be supplied in pSC101 or pST vectors flanked bysuitable restriction sites (e.g. BsaI, AvrII, SpeI, or XbaI). Thus,during assembly the synthetic DNA stretches may be excised by digestionwith the corresponding restriction enzymes. Assembly of the episomalreplicon may be verified by sequencing.

Suitably the two homology regions may be 30-100 bp, or 40-50 bp, orabout 50 bp in length.

CRISPR/Cas9 machinery may be used to for excision. In some embodimentsthe CRISPR/Cas9 machinery comprises Cas9, tracrRNA and two spacer RNAs,wherein the spacer RNAs target the two homology regions for excision. Inpreferred embodiments, the spacer RNAs are linear double strandedspacers. In other embodiments, the CRISPR/Cas9 machinery comprises Cas9and two sgRNAs, wherein the sgRNAs target the two homology regions forexcision.

Lambda red recombination machinery may be used for recombination. Thelambda red recombination machinery may comprise lambda alpha/beta/gamma.

The method may comprise performing one or more rounds of REXER, i.e. thesteps as described above with a first donor nucleic acid sequence,choosing further donor sequence(s) contiguous with said first donornucleic acid sequence, and repeating said steps with said further donornucleic acid sequence(s) until the partially synthetic genome has beenassembled. This is known as genome stepwise interchange synthesis(GENESIS), described in Wang, K. et al., 2016. Nature 539, 59-64 and isshown schematically in FIG. 4 .

In preferred embodiments the donor sequence(s) correspond to regions ofthe synthetic genome according to the present invention and/or topolynucleotides according to the present invention.

Thus, the donor sequence(s) (i.e. synthetic region) may comprise 20 orfewer occurrences of one or more sense codons; and/or the donorsequence(s) may comprise 10 or more, 20 or more, or 100 or more geneswith no occurrences of one or more sense codons.

The donor sequence(s) (i.e. synthetic region) may be identical tosequences (i.e. non-synthetic regions) of the parent genome except thatthey have 50 or fewer, 20 or fewer, 10 or fewer, 5 or fewer, or 0occurrences of each of one or more sense codons; and/or comprise lessthan 10%, 5%, 2%, 1%, 0.5%, 0.1% of the occurrences of each of one ormore sense codons, relative to the corresponding region in the parentgenome; and/or comprise 10 or more, 20 or more, or 100 or more geneswith no occurrences of one or more sense codons.

The donor sequence(s) (i.e. synthetic region) may also be refactoredrelative to the sequences (i.e. non-synthetic regions) of the parentgenome. For 3′,3′ overlaps (i.e. pairs of genes in oppositeorientations) a synthetic insert may be inserted between the genes. For3′,3′ overlaps the synthetic insert may comprise the overlapping region.For 5′, 3′ overlaps (i.e. pairs of genes in the same orientation) asynthetic insert may be inserted between the genes. For 5′, 3′ overlapsthe synthetic insert may comprise: (i) a stop codon; (ii) about 20-200bp, or 20-100 bp, or 20-50 bp, from upstream of the overlapping region;and (iii) the overlapping region. Preferably, the synthetic insertcomprises: (i) a stop codon; (ii) about 20 bp from upstream of theoverlapping region; and (iii) the overlapping region. In preferredembodiments the stop codon is in frame with the original start site forthe downstream gene. Preferably the stop codon is TAA.

Preferably the donor sequence(s) (i.e. synthetic region) arecollectively 50-10000 kb, 100-5000 kb, 100-2000 kb, 100-1000 kb, or100-500 kb in size. Preferably each donor sequence is 50-300 kb, 100-200kb, or about 100 kb in size.

Accordingly, the donor sequences may each be about 100 kb in size andidentical to corresponding sequences of the parent genome, except theycomprise no occurrences of one or more sense codons and all pairs ofgenes which share an overlapping region comprising the one or more sensecodons in the parent genome are refactored, wherein the pairs of genesare those in which sense codon replacements would change the encodedprotein sequence of both or either of the pair of genes.

In preferred embodiments the viability of the genome is tested aftereach round of recombination-mediated genetic engineering. In someembodiments the sequence of the genome is verified after each round ofrecombination-mediated genetic engineering.

Partially Synthetic Genomes

The present invention provides two or more different partially syntheticgenomes.

As used herein a “partially synthetic genome” is a genome in which oneor more contiguous regions of the parent genome have been edited (i.e.the partially synthetic genomes comprise one or more synthetic regions),wherein one or more contiguous (synthetic) regions do not cover thewhole of the parent genome. Preferably, the partially synthetic genomesof the present invention have one contiguous (synthetic) region. Incontrast, a “synthetic genome” may comprise genome edits which coversubstantially all of the parent genome.

The partially synthetic genomes of the present invention may beprokaryotic genomes. Preferably, the partially synthetic genomes of thepresent invention are bacterial genomes. More preferably, the partiallysynthetic genomes of the present invention are Escherichia coli,Salmonella enterica, or Shigella dysenteriae genomes. Most preferably,the partially synthetic genomes of the present invention are E. coligenomes. In some embodiments the partially synthetic genomes are reducedor minimal partially synthetics genomes. In preferred embodiments, thepartially synthetic genomes are viable genomes.

In some embodiments the partially synthetic genomes of the presentinvention are 100 kb to 20 Mb, or 130 kb to 15 Mb, or 200 kb to 15 Mb,or 300 kb to 15 Mb, or 500 kb to 15 Mb, or 1 Mb to 15 Mb, or 1 Mb to 10Mb, or 1 Mb to 8 Mb, or 1 Mb to 6 Mb, or 2 Mb to 6 Mb, or 2 Mb to 5 Mb,or 3 Mb to 5 Mb, or about 4 Mb in size.

The partially synthetic genomes may comprise a synthetic region that has50 or fewer, 20 or fewer, 10 or fewer, 5 or fewer, or 0 occurrences ofeach of one or more sense codons; or the partially synthetic genomes maycomprise a synthetic region that has less than 10%, 5%, 2%, 1%, 0.5%,0.1% of the occurrences of each of one or more sense codons, relative tothe corresponding region in the parent genome.

Preferably, the synthetic regions are 50-10000 kb, 100-5000 kb, or100-500 kb in size.

Thus, the partially synthetic genomes may comprise one or morecontiguous regions of 100-5000 kb that have 10 or fewer, 5 or fewer, orno occurrences of each of one or more sense codons; and/or the partiallysynthetic genomes may comprise one or more contiguous regions of100-5000 kb that have less than 10%, 5%, 2%, 1%, 0.5%, 0.1% of theoccurrences of each of one or more sense codons, relative to thecorresponding region in the parent genome; and/or the partiallysynthetic genomes may comprise one or more contiguous regions of100-5000 kb that have 10 or more, 20 or more, or 100 or more genes withno occurrences of one or more sense codons

The remainder of the partially synthetic genome (i.e. the non-syntheticregion(s)) may have un-altered sense codons. Thus, the partiallysynthetic genomes may comprise one or more non-synthetic region(s) thathave 100% or 99% of the occurrences of each sense codons, relative tothe corresponding region in the parent genome; and/or the partiallysynthetic genomes may comprise one or more non-synthetic region(s) thathave 100 or more genes with occurrences of each sense codon. Thenon-synthetic regions may be 500 kb to 20 Mb, or 500 kb to 10 Mb, or 500kb to 5 Mb, or about 3.5 Mb in size.

For example, the partially synthetic genomes may comprise one contiguousregion (i.e. a synthetic region) of 100-5000 kb that has 10 or more, 20or more, or 100 or more genes with no occurrences of one or more sensecodons and one contiguous region of 500 kb-10000 kb (i.e. anon-synthetic region) that has 100 or more genes with occurrences ofeach sense codon.

The two or more different partially synthetic genomes may be derivedfrom the same parent genome, i.e. comprise substantially the samesequences, e.g. the two or more different partially synthetic genomesmay share 90%, 95%, 99%, or 99.5% sequence identity.

The two or more different partially synthetic genomes may comprise oneor more synthetic regions, such that the synthetic regions collectivelycover 90% or greater, 95% or greater, 99% or greater or 100% of theparent genome. Preferably, the two or more different partially syntheticgenomes each comprise one or more synthetic regions, wherein thesynthetic regions do not substantially overlap, (e.g. the overlapbetween synthetic regions is 10 kb or less, preferably about 3-4 kb).Thus, the two or more different partially synthetic genomes may eachcomprise one unique or substantially unique synthetic region.

Thus, in preferred embodiments the two or more different partiallysynthetic genomes each comprise one contiguous synthetic region of100-5000 kb that has 10 or more, 20 or more, or 100 or more genes withno occurrences of one or more sense codons and one non-syntheticcontiguous region of 500 kb-10000 kb that has 100 or more genes withoccurrences of each sense codon; wherein the synthetic regionscollectively cover substantially all of the parent genome and whereinthe synthetic regions do not substantially overlap.

The two or more different partially synthetic genomes may be suitablefor directed conjugation. Thus, in preferred embodiments the two or moredifferent partially synthetic genomes comprise at least one partiallysynthetic donor genome and at least one partially synthetic recipientgenome. The method of the invention may comprise a further step of oneor more rounds of recombination-mediated genetic engineering, preferablylambda red mediated genetic engineering (prior to directed conjugation)to provide at least one partially synthetic donor genome and at leastone partially synthetic recipient genome. The method may furthercomprise one or more rounds of selection for the at least one partiallysynthetic donor genome and at least one partially synthetic recipientgenome.

The at least one partially synthetic donor genome may comprise asynthetic region and a first selectable marker flanked by two homologyregions immediately downstream of an origin of transfer; and the atleast one partially synthetic recipient genome may comprise a secondselectable marker flanked by two corresponding homology regions,optionally wherein the first selectable marker comprises a positiveselectable marker, and/or the second selectable marker comprises anegative selectable marker.

Suitably the negative selectable marker is selected from the groupconsisting of sacB (sucrose sensitivity), rpsL (S12 ribosomalprotein—streptomycin sensitivity), or phe^(ST251A_A294G)(4-chlorophenylalanine sensitivity). Suitably the positive selectablemarker is selected from the group consisting of Cm^(R) (chloramphenicolresistance), Kan^(R) (kanamycin resistance), Hyg^(R) (hygromycinresistance), Gentamycin^(R) (gentamycin resistance), or tetracycline^(R)(tetracycline resistance). The selectable markers may be different tothose in the one or more steps of recombination-mediated geneticengineering.

Preferably the synthetic region present in the at least one partiallysynthetic recipient genomes is outside the region flanked by thehomology regions, i.e. the synthetic regions do not substantiallyoverlap. Preferably the homology regions are 3 kb to 500 kb in length,most preferably about 3-5 kb.

Directed Conjugation

One or more rounds of directed conjugation may be carried out on the twoor more different partially synthetic genomes of the present inventionto produce a synthetic genome.

Each round of directed conjugation may be used to provide partiallysynthetic genomes with larger contiguous synthetic regions. For example,after the one or more rounds of recombination-mediated geneticengineering there may be 8 partially synthetic genomes, each with acontiguous synthetic region of about 500 kb. After a first round ofdirected conjugation, two of the partially synthetic genomes may becombined to provide 6 partially synthetic genomes, each with acontiguous synthetic region of about 500 kb and 1 partially syntheticgenome with contiguous synthetic region of about 1 Mb. A second roundmay provide either 5 partially synthetic genomes, each with a contiguoussynthetic region of about 500 kb and 1 partially synthetic genome withcontiguous synthetic region of about 1.5 Mb; or 4 partially syntheticgenomes, each with a contiguous synthetic region of about 500 kb and 2partially synthetic genome each with a contiguous synthetic region ofabout 1 Mb. After several rounds of directed conjugation a completelysynthetic genome (i.e. one with a contiguous synthetic region of about 4Mb) may be provided. An example is shown schematically in FIGS. 10 and11 b.

Any suitable method of directed conjugation may be used. Methods ofdirected conjugation will be well known to those of skill in the art,for instance as described by Ma, N. J., Moonan, D. W. and Isaacs, F. J.,2014. Nature Protocols, 9(10), p. 2285. The route to the syntheticgenome is not limited.

Thus, the one or more rounds of directed conjugation may comprise:

-   -   i) providing a first host cell comprising a partially synthetic        recipient genome, and a second host cell comprising a partially        synthetic donor genome and a conjugative plasmid;    -   ii) a step of conjugation of the partially synthetic recipient        genome and partially synthetic donor genome; and    -   iii) selecting for recombinants having incorporated the        synthetic region of the donor genome into the partially        synthetic recipient genome.

The partially synthetic donor genome may comprise a synthetic region anda first selectable marker flanked by two homology regions immediatelydownstream of an origin of transfer; and the partially syntheticrecipient genomes may comprise a second selectable marker flanked by twocorresponding homology regions, optionally wherein the first selectablemarker comprises a positive selectable marker, and/or the secondselectable marker comprises a negative selectable marker. Thus, step(iii) may comprise selection for said selectable markers, i.e. selectionfor gain of the first selectable marker and loss of the secondselectable marker.

Suitably the negative selectable marker is selected from the groupconsisting of sacB (sucrose sensitivity), rpsL (S12 ribosomalprotein—streptomycin sensitivity), or phe^(ST251A_A294G) (4chlorophenylalanine sensitivity). Suitably the positive selectablemarker is selected from the group consisting of Cm^(R) (chloramphenicolresistance), Kan^(R) (kanamycin resistance), Hyg^(R) (hygromycinresistance), Gentamycin^(R) (gentamycin resistance), or tetracycline^(R)(tetracycline resistance). The selectable markers may be different tothose in the one or more steps of recombination-mediated geneticengineering.

Preferably the homology regions are 3 kb to 500 kb in length, mostpreferably about 3-5 kb. Preferably, the homology regions are 50 kb to500 kb when the step of directed conjugation is the final step ofdirected conjugation.

Step (ii) may comprise incubating the first host cell and the secondhost cell. For example, first host cell and the second host cell may bemixed, transferred onto a suitable medium (e.g. agar plates) andincubated at about 37° C. for about 1-3 hours.

The conjugative plasmid may be an F plasmid, preferably wherein theconjugative plasmid does not comprise an origin of transfer. (e.g. FIG.22C).

In preferred embodiments the viability of the genome is tested aftereach round of directed conjugation. Advantageously, this verifies thatthe genome edits (e.g. sense codon replacements) result in a viablegenome, and allows for non-permitted edits to be corrected.

In some embodiments the sequence of the genome is verified after eachround of directed conjugation.

The skilled person will understand that they can combine all features ofthe invention disclosed herein without departing from the scope of theinvention as disclosed.

Preferred features and embodiments of the invention will now bedescribed by way of non-limiting examples.

The practice of the present invention will employ, unless otherwiseindicated, conventional techniques of chemistry, biochemistry, molecularbiology, microbiology and immunology, which are within the capabilitiesof a person of ordinary skill in the art. Such techniques are explainedin the literature. See, for example, Sambrook, J., Fritsch, E. F. andManiatis, T. (1989) Molecular Cloning: A Laboratory Manual, 2nd Edition,Cold Spring Harbor Laboratory Press; Ausubel, F. M. et al. (1995 andperiodic supplements) Current Protocols in Molecular Biology, Ch. 9, 13and 16, John Wiley & Sons; Roe, B., Crabtree, J. and Kahn, A. (1996) DNAIsolation and Sequencing: Essential Techniques, John Wiley & Sons;Polak, J. M. and McGee, J. O'D. (1990) In Situ Hybridization: Principlesand Practice, Oxford University Press; Gait, M. J. (1984)Oligonucleotide Synthesis: A Practical Approach, IRL Press; and Lilley,D. M. and Dahlberg, J. E. (1992) Methods in Enzymology: DNA StructuresPart A: Synthesis and Physical Analysis of DNA, Academic Press.

EXAMPLES Example 1—Design of a Genome with Synonymous Codon Compression

We first designed a version of the E. coli MDS42 genome (Uniprotaccession number AP012306.1) in which the serine codons TCG and TCA andthe stop codon TAG in open reading frames (ORFs) are systematicallyreplaced by their synonyms AGC, AGT, and TAA, respectively (FIG. 1A,FIG. 18 , SEQ ID NO: 1). We have previously shown that this definedrecoding scheme for synonymous codon compression is allowed on a 20 kbregion of the E. coli genome rich in essential genes (Wang, K. et al.,2016. Nature 539, 59-64). However, this region only accounts for 0.46%of the target codons in the genome.

E. coli contains numerous overlapping open reading frames (ORFs), and weclassify the overlaps as 3′, 3′ (between ORFs in opposite orientations)or 5′, 3′ (between ORFs in the same orientation). Targeted codons arefound within both classes of overlap. If the recoding of each ORF withina 3′, 3′ overlap could be achieved without changing the encoded proteinsequence of either ORF—i.e.: by introducing synonymous codon(s)—then theoverlap structure was maintained and the sequences were directlyrecoded. However, when this was not possible we duplicated theoverlapping region, and individually recoded each ORFs (FIG. 1B, Table1).

For 5′, 3′ overlaps we separated the ORFs by duplicating both the regionof overlap between the ORFs and the 20 bp sequence upstream of theoverlap. This refactoring allows us to recode each ORF independently(FIG. 1C, Table 1). Our strategy preserves the sequence of the RBS forthe downstream ORF and the distance between this RBS and its startcodon.

Using the defined rules for synonymous codon compression and refactoringwe designed a genome in which all 18,218 target codons are recoded totheir target synonyms (FIG. 1D).

TABLE 1 Overlaps and refactoring Listed are the 92 cases of overlapsaccounted for by refactoring in the MDS42 designed genome (FIG. 18, SEQID NO: 1). Provided is additional information about genomic location,surrounding genes, overlap length, codons changed, and the refactoringstrategy implemented. Up- Down- Over- Re- Overlap stream stream lapCodons factoring No. type gene gene Start End Start End length changedStrategy length 1 Head-to-tail kefF kefC 42,594 42,625 42,594 42,601 8 1Duplication + in-frame TAA 32 STOP codon + 20nt insertion 2 Head-to-tailftsI murE 88,023 88,060 87,991 88,004 14 1 Duplication + in-frame TAA 38STOP codon + 20nt insertion 3 Head-to-tail murF mraY 90,897 90,92890,827 90,833 7 1 Duplication + in-frame TAA 32 STOP codon + 20ntinsertion 4 Head-to-tail yaeQ yaeJ 203,847 203,875 203,694 203,697 4 1Duplication + in-frame TAA 29 STOP codon + 20nt insertion 5 Tail-to-tailyafJ yafK 234,139 234,168 233,927 233,956 30 1 Duplication 30 6Head-to-tail yahE yahF 263,179 263,213 262,967 262,977 11 1Duplication + in-frame TAA 35 STOP codon + 20nt insertion 7 Head-to-tailcodB codA 282,607 282,641 282,360 282,370 11 1 Duplication + in-frameTAA 35 STOP codon + 20nt insertion 8 Head-to-tail mhpD mhpF 299,392299,420 299,110 299,113 4 1 Duplication + in-frame TAA 29 STOP codon +20nt insertion 9 Head-to-tail yajL panE 351,696 351,757 351,347 351,38438 2 Duplication + in-frame TAA 62 STOP codon + 20nt insertion 10Head-to-tail mdlA mdlB 378,752 378,783 378,379 378,386 8 1 Duplication +in-frame TAA 32 STOP codon + 20nt insertion 11 Head-to-tail hemH aes407,162 407,162 406,757 406,760 4 1 Silent mutation CGC to CGT (Arg) 12Head-to-tail ybbL ybbM 424,731 424,768 424,326 424,339 14 1Duplication + in-frame TAA 38 STOP codon + 20nt insertion 13Head-to-tail ybbO tesA 427,336 427,370 426,882 426,892 11 1Duplication + in-frame TAA 35 STOP codon + 20nt insertion 14Head-to-tail citG citX 521,504 521,553 521,000 521,025 26 2Duplication + in-frame TAA 50 STOP codon + 20nt insertion 15Head-to-tail nei abrB 609,986 609,987 609,456 609,459 4 2 Silentmutations CAC to CAT (His), GCC to GCT (Ala) 16 Tail-to-tail ybhQ ybhR688,302 688,340 687,735 687,773 39 1 Duplication 39 17 Head-to-tail ybhGybiH 693,272 693,292 692,705 692,705 1 1 Duplication + in-frame TAA 21STOP codon + 20nt insertion 18 Head-to-tail yliI yliJ 743,178 743,178742,587 742,590 4 1 Silent mutation AGC to AGT (Ser) 19 Head-to-tailybjR ybjS 769,064 769,064 768,473 768,476 4 1 Silent mutation CGC to CGT(Arg) 20 Head-to-tail ycaR kdsB 834,173 834,201 833,585 833,588 4 1Duplication + in-frame TAA 29 STOP codon + 20nt insertion 21Tail-to-tail ycbJ ycbC 835,996 836,019 835,355 835,378 24 2 Duplication24 22 Head-to-tail ycbW ycbX 869,865 869,865 869,224 869,227 4 1 Silentmutation GTC to GTT (Val) 23 Tail-to-tail yccR yccS 885,142 885,179884,463 884,500 38 1 Duplication 38 24 Head-to-tail hyaC hyaD 899,182899,210 898,503 898,506 4 1 Duplication + in-frame TAA 29 STOP codon +20nt insertion 25 Head-to-tail hyaE hyaF 900,190 900,218 899,482 899,4854 1 Duplication + in-frame TAA 29 STOP codon + 20nt insertion 26Tail-to-tail torT torR 912,244 912,271 911,479 911,506 28 1 Duplication28 27 Head-to-tail torA torD 916,781 916,809 916,016 916,019 4 1Duplication + in-frame TAA 29 STOP codon + 20nt insertion 28Head-to-tail ycfM ycfN 995,426 995,469 994,632 994,651 20 1Duplication + in-frame TAA 44 STOP codon + 20nt insertion 29Head-to-tail sapC sapB 1,159,586 1,159,623 1,158,734 1,158,747 14 1Duplication + in-frame TAA 38 STOP codon + 20nt insertion 30Head-to-tail ycjV ymjB 1,186,895 1,186,915 1,186,018 1,186,018 1 1Duplication + in-frame TAA 21 STOP codon + 20nt insertion 31Tail-to-tail rimL ydck 1,212,937 1,212,945 1,212,031 1,212,039 9 1Duplication 9 32 Head-to-tail ddpD ddpC 1,266,751 1,266,779 1,265,8411,265,844 4 1 Duplication + in-frame TAA 29 STOP codon + 20nt insertion33 Head-to-tail ego lsrC 1,310,781 1,310,812 1,309,846 1,309,852 7 1Duplication + in-frame TAA 32 STOP codon + 20nt insertion 34Tail-to-tail ydjQ ydjR 1,506,953 1,506,993 1,505,945 1,505,985 41 1Duplication 41 35 Head-to-tail astA astC 1,513,357 1,513,385 1,512,3451,512,348 4 1 Duplication + in-frame TAA 29 STOP codon + 20nt insertion36 Tail-to-tail nudG ynjH 1,524,518 1,524,552 1,523,446 1,523,480 35 1Duplication 35 37 Tail-to-tail yeaL yeaM 1,556,017 1,556,060 1,554,9011,554,944 44 3 Duplication 44 38 Head-to-tail yebS yebT 1,598,7721,598,827 1,597,656 1,597,687 32 2 Duplication + in-frame TAA 56 STOPcodon + 20nt insertion 39 Head-to-tail exoX ptrB 1,608,100 1,608,1001,606,925 1,606,928 4 1 Silent mutation GAC to GAT (Asp) 40 Head-to-tailznuC znuB 1,624,732 1,624,760 1,623,560 1,623,563 4 1 Duplication +in-frame TAA 29 STOP codon + 20nt insertion 41 Head-to-tail otsA otsB1,646,196 1,646,245 1,644,969 1,644,994 26 1 Duplication + in-frame TAA50 STOP codon + 20nt insertion 42 Tail-to-tail yedA vsr 1,668,5261,668,537 1,667,263 1,667,274 12 1 Duplication 12 43 Head-to-tail vsrdcm 1,668,997 1,669,040 1,667,714 1,667,733 20 1 Duplication + in-frameTAA 44 STOP codon + 20nt insertion 44 Tail-to-tail yegV yegW 1,757,5161,757,542 1,756,182 1,756,208 27 2 Duplication 27 45 Head-to-tail yehPyehQ 1,784,581 1,784,609 1,783,247 1,783,250 4 1 Duplication + in-frameTAA 29 STOP codon + 20nt insertion 46 Head-to-tail yeiT yeiA 1,810,7751,810,806 1,809,412 1,809,418 7 1 Duplication + in-frame TAA 32 STOPcodon + 20nt insertion 47 Head-to-tail ccmF ccME 1,866,667 1,866,6951,865,268 1,865,271 4 1 Duplication + in-frame TAA 29 STOP codon + 20ntinsertion 48 Head-to-tail napB napH 1,870,510 1,870,538 1,869,0821,869,085 4 1 Duplication + in-frame TAA 29 STOP codon + 20nt insertion49 Head-to-tail napH napG 1,871,399 1,871,436 1,869,932 1,869,945 14 1Duplication + in-frame TAA 38 STOP codon + 20nt insertion 50Head-to-tail yfbG yfbH 1,941,876 1,941,904 1,940,385 1,940,388 4 1Duplication + in-frame TAA 29 STOP codon + 20nt insertion 51Head-to-tail eutA eutH 2,114,032 2,114,060 2,112,508 2,112,511 4 1Duplication + in-frame TAA 29 STOP codon + 20nt insertion 52Tail-to-tail csiE hcaT 2,213,892 2,213,900 2,212,334 2,212,342 9 2Duplication 9 53 Head-to-tail yfiM kgtP 2,271,633 2,271,633 2,270,0752,270,078 4 1 Silent mutation CAC to CAT (His) 54 Head-to-tail srlA srlE2,338,485 2,338,513 2,336,927 2,336,930 4 1 Duplication + in-frame TAA29 STOP codon + 20nt insertion 55 Head-to-tail hypB hypC 2,363,9862,364,020 2,362,399 2,362,408 10 1 Duplication + in-frame TAA 35 STOPcodon + 20nt insertion 56 Head-to-tail ygbJ ygbK 2,374,492 2,374,5202,372,870 2,372,873 4 1 Duplication + in-frame TAA 29 STOP codon + 20ntinsertion 57 Head-to-tail ygcN ygcO 2,406,105 2,406,139 2,404,4542,404,463 10 1 Duplication + in-frame TAA 35 STOP codon + 20nt insertion58 Head-to-tail ppdC ygdB 2,474,986 2,475,026 2,473,284 2,473,299 16 1Duplication + in-frame TAA 41 STOP codon + 20nt insertion 59Tail-to-tail lysR ygeA 2,492,219 2,492,232 2,490,478 2,490,491 14 2Duplication 14 60 Head-to-tail hybB hybA 2,626,155 2,626,189 2,624,4032,624,413 11 1 Duplication + in-frame TAA 35 STOP codon + 20nt insertion61 Head-to-tail yraM yraN 2,770,503 2,770,570 2,768,727 2,768,769 43 1Duplication + in-frame TAA 68 STOP codon + 20nt insertion 62Tail-to-tail yhbO yhbP 2,774,670 2,774,690 2,772,805 2,772,825 21 1Duplication 21 63 Head-to-tail mreD mreC 2,868,593 2,868,613 2,866,7272,866,727 1 1 Duplication + 20nt insertion 21 64 Head-to-tail yheT yheU2,938,030 2,938,061 2,936,144 2,936,150 7 1 Duplication + in-frame TAA32 STOP codon + 20nt insertion 65 Tail-to-tail yhhA ugpQ 3,037,3193,037,332 3,035,387 3,035,400 14 1 Duplication 14 66 Head-to-tail nikDnikE 3,067,725 3,067,753 3,065,793 3,065,796 4 1 Duplication + in-frameTAA 29 STOP codon + 20nt insertion 67 Head-to-tail bcsC bcsZ 3,130,0423,130,085 3,128,062 3,128,080 19 1 Duplication + in-frame TAA 44 STOPcodon + 20nt insertion 68 Head-to-tail bcsA yhjQ 3,136,149 3,136,1773,134,140 3,134,143 4 1 Duplication + in-frame TAA 29 STOP codon + 20ntinsertion 69 Head-to-tail bcsE bcsF 3,138,967 3,138,995 3,136,9333,136,936 4 1 Duplication + in-frame TAA 29 STOP codon + 20nt insertion70 Tail-to-tail yiaC bisC 3,155,063 3,155,094 3,152,968 3,152,999 32 1Duplication 32 71 Head-to-tail xylG xylH 3,173,279 3,173,325 3,171,1843,171,206 23 1 Duplication + in-frame TAA 47 STOP codon + 20nt insertion72 Head-to-tail sgbU sgbE 3,189,692 3,189,723 3,187,550 3,187,556 7 1Duplication + in-frame TAA 32 STOP codon + 20nt insertion 73Tail-to-tail yibQ yibD 3,320,449 3,320,462 3,218,261 3,218,274 14 1Duplication 14 74 Head-to-tail yicG ligB 3,250,889 3,250,889 3,248,7013,248,704 4 1 Silent mutation CAC to CAT (His) 75 Head-to-tail yidG yidH3,287,372 3,287,406 3,285,173 3,285,183 11 1 Duplication + in-frame TAA35 STOP codon + 20nt insertion 76 Head-to-tail cbrA dgoT 3,301,8773,301,877 3,299,651 3,299,654 4 1 Silent mutation GGC to GGT (Gly) 77Head-to-tail rnpA yidD 3,316,252 3,316,313 3,314,029 3,314,065 37 2Duplication + in-frame TAA 62 STOP codon + 20nt insertion 78Tail-to-tail rbsR hsrA 3,370,718 3,370,752 3,368,398 3,368,432 35 4Duplication 35 79 Tail-to-tail yigM metR 3,443,509 3,443,621 3,441,0763,441,188 113 4 Duplication 113 80 Head-to-tail tatD rfaH 3,455,9823,455,982 3,453,546 3,453,549 4 1 Silent mutation CTC to CTT (Leu) 81Head-to-tail cpxA cpxR 3,536,622 3,536,650 3,534,185 3,534,188 4 1Duplication + in-frame TAA 29 STOP codon + 20nt insertion 82Head-to-tail pflD pflC 3,577,933 3,577,991 3,575,471 3,575,505 35 1Duplication + in-frame TAA 59 STOP codon + 20nt insertion 83Tail-to-tail frwD yijO 3,579,214 3,579,227 3,576,679 3,576,692 14 1Duplication 14 84 Head-to-tail murB birA 3,604,830 3,604,858 3,602,2953,602,298 4 1 Duplication + in-frame TAA 29 STOP codon + 20nt insertion85 Head-to-tail zraR purD 3,636,422 3,636,422 3,633,855 3,633,858 4 1Silent mutation AAC to AAT (Asn) 86 Head-to-tail actP yjcH 3,716,6803,716,708 3,714,112 3,714,115 4 1 Duplication + in-frame TAA 29 STOPcodon + 20nt insertion 87 Head-to-tail phnM phnL 3,748,914 3,748,9423,746,317 3,746,320 4 1 Duplication + in-frame TAA 29 STOP codon + 20ntinsertion 88 Head-to-tail dipZ cutA 3,796,767 3,796,816 3,794,1203,794,144 25 1 Duplication + in-frame TAA 50 STOP codon + 20nt insertion89 Head-to-tail sugE blc 3,808,963 3,808,963 3,806,291 3,806,294 4 1Silent mutation CAC to CAT (His) 90 Head-to-tail yjeF yjeE 3,827,3593,827,411 3,824,687 3,824,715 29 1 Duplication + in-frame TAA 53 STOPcodon + 20nt insertion 91 Tail-to-tail ytfA ytfB 3,859,923 3,859,9393,857,181 3,857,197 17 1 Duplication 17

Example 2—Synthesis of Recoded Sections

We performed a retrosynthesis, analogous to that commonly used fordesigning synthetic routes to small molecules, on the designed genome(FIGS. 2A-2C). We disconnected the genome into 8 sections, A-H, ofapproximately 0.5 Mb (FIG. 1D, FIG. 2A, FIG. 18 , SEQ ID NO: 1) and thendisconnected each section into 4-5 fragments (FIG. 2B). This yielded 37fragments (FIG. 1D, Table 2) of 91 kb to 136 kb. We placed theboundaries between fragments, and between sections, in intergenicregions between non-essential genes. The fragments were furtherdisconnected into 9-14 stretches of approximately 10 kb (FIG. 2C, Table2).

We assembled BACs for REXER (FIG. 2C, FIGS. 20A-20D) containing eachfragment via homologous recombination in S. cerevisiae (Wang, K. et al.,2016. Nature 539, 59-64; and Kouprina, N., et al., 2004. Methods MolBiol 255, 69-89). For 36 of the fragments, BAC assembly proceededsmoothly (Table 3). Fragment 37 was challenging to assemble and wetherefore split it into two 50 kb fragments (37a and 37b), which werestraightforward to assemble (Table 3).

We initiated genome replacement in seven distinct strains, via REXER.The start point for REXER in each strain corresponds to the beginning ofsections A, C, D, E, F, G or H (FIG. 1D, FIG. 2B, FIG. 3 ); section Bwas subsequently built on section A, as described below. We marked thestart point of genome replacement in each strain by the introduction ofa cassette bearing a positive and negative selection marker. Weintroduced Cas9 (Jiang, W., et al., 2013. Nat Biotechnol 31, 233-239),the lambda red recombination machinery (Datsenko, K. A. & Wanner, B. L.,2000. Proc Natl Acad Sci USA 97, 6640-6645), and the BAC containing thefirst recoded fragment for each section into the relevant strain, andinitiated replacement of genomic DNA by the addition of DNA encoding therelevant Cas9 spacers (Jiang, W., et al., 2013. Nat Biotechnol 31,233-239) to the cells. Cas9 mediated excision of the recoded DNA fromthe BAC and lambda red mediated recombination of this DNA into thegenome led to replacement of a section of genomic DNA with recoded DNA,removal of the positive and negative selection markers from the genome,and introduction of new, orthogonal, positive and negative selectionmarkers. Clones that had recombined over the target region were selectedon the basis of having lost the negative selection marker from thegenome and gained the positive selection marker from the BAC.

In each strain, the positive and negative selection markers that areintroduced in the first REXER provide a template for the next round ofREXER, enabling genome stepwise interchange synthesis (GENESIS) (FIG.2B, FIG. 4 ). We used plasmid encoded spacers for early rounds of REXER(Table 4, FIG. 20D, FIGS. 21A and 211B). However, we subsequently foundthat REXER could be initiated by the electroporation of linear doublestranded spacers generated by PCR (Table 4, FIG. 21A). Since thesespacers do not propagate through cell division this enabled the cellsfrom one step of REXER to be used more rapidly for the next step ofREXER. This advance accelerated GENESIS. For sections A, C, D, E, F, andG we proceeded with GENESIS in a clockwise direction for 4-5 steps ofREXER, until we had replaced approximately 0.5 Mb of genomic DNA withsynthetic DNA. Because section A was initiated first, and was completedahead of the other sections, we proceeded with GENESIS through section Bupon reaching the end of section A.

Following each REXER we sequenced the resulting genomes to identifycells that were fully recoded over the targeted region of the genome(Table 4). In parallel, we carried out a large number of single stepREXERs (Table 4) to rapidly identify 100 kb regions of the genome thatmay be challenging to recode, before we arrived at them through GENESIS.For 35 of the 38 steps, including all of sections A, C, D, E, F and G,we were able to completely recode the targeted genomic sequence byGENESIS. We only observed incomplete replacement of the correspondinggenomic region by synthetic DNA for fragment 9, in section B, and forfragments 37a and 1, in section H, (Table 4).

TABLE 2 Table MDS42 10 kb stretches The genomic locations are listed forall of the 10 kb stretches which comprise the designed synthetic MDS42genome. Length Stretch 5′ start . . . 3′end (bp) 100k01-01 83,869 . . .95,593 11725 100k01-02  95,399 . . . 101,629 6231 100k01-03 101,435 . .. 112,646 11212 100k01-04 112,448 . . . 122,780 10333 100k01-05 122,621. . . 132,166 9546 100k01-06 131,945 . . . 144,626 12682 100k01-07144,430 . . . 156,454 12025 100k01-08 156,309 . . . 162,400 6092100k01-09 162,193 . . . 173,408 11216 100k01-10 173,127 . . . 181,7488622 100k02-01 181,678 . . . 191,139 9462 100k02-02 191,016 . . .201,015 10000 100k02-03 200,896 . . . 212,598 11703 100k02-04 212,483 .. . 220,477 7995 100k02-05 220,357 . . . 229,377 9021 100k02-06 229,255. . . 237,503 8249 100k02-07 237,380 . . . 248,528 11149 100k02-08248,409 . . . 259,180 10772 100k02-09 259,061 . . . 269,318 10258100k02-10 269,196 . . . 279,245 10050 100k02-11 279,122 . . . 289,73310612 100k02-12 289,609 . . . 303,206 13598 100k03-01 303,144 . . .313,764 10621 100k03-02 313,641 . . . 325,092 11452 100k03-03 324,973 .. . 334,137 9165 100k03-04 334,018 . . . 343,618 9601 100k03-05 343,499. . . 353,344 9846 100k03-06 353,225 . . . 362,491 9267 100k03-07362,373 . . . 371,912 9540 100k03-08 371,794 . . . 380,649 8856100k03-09 380,534 . . . 393,822 13289 100k03-10 393,703 . . . 405,21411512 100k03-11 405,100 . . . 415,406 10307 100k03-12 415,290 . . .425,574 10285 100k03-13 425,457 . . . 437,443 11987 100k04-01 437,351 .. . 447,358 10008 100k04-02 447,239 . . . 457,565 10327 100k04-03457,446 . . . 466,960 9515 100k04-04 466,841 . . . 476,935 10095100k04-05 476,816 . . . 486,528 9713 100k04-06 486,409 . . . 496,2309822 100k04-07 496,111 . . . 506,009 9899 100k04-08 505,890 . . .515,348 9459 100k04-09 515,231 . . . 525,913 10683 100k04-10 525,799 . .. 532,888 7090 100k05-01 532,792 . . . 543,100 10309 100k05-02 542,981 .. . 555,707 12727 100k05-03 555,591 . . . 566,274 10684 100k05-04566,155 . . . 576,486 10332 100k05-05 576,367 . . . 588,061 11695100k05-06 587,942 . . . 598,541 10600 100k05-07 598,422 . . . 609,16210741 100k05-08 609,043 . . . 617,744 8702 100k05-09 617,625 . . .628,315 10691 100k05-10 628,200 . . . 637,895 9696 100k06-01 637,794 . .. 648,173 10380 100k06-02 648,059 . . . 658,187 10129 100k06-03 658,075. . . 666,632 8558 100k06-04 666,513 . . . 676,267 9755 100k06-05676,148 . . . 683,859 7712 100k06-06 683,740 . . . 694,050 10311100k06-07 693,931 . . . 705,086 11156 100k06-08 704,967 . . . 716,42811462 100k06-09 716,309 . . . 727,640 11332 100k06-10 727,521 . . .736,154 8634 100k06-11 736,035 . . . 741,978 5944 100k07-01 741,877 . .. 751,411 9535 100k07-02 751,295 . . . 763,017 11723 100k07-03 762,898 .. . 772,642 9745 100k07-04 772,523 . . . 782,523 10001 100k07-05 782,406. . . 794,373 11968 100k07-06 794,255 . . . 804,092 9838 100k07-07803,973 . . . 813,644 9672 100k07-08 813,527 . . . 823,429 9903100k07-09 823,322 . . . 834,999 11678 100k07-10 834,886 . . . 846,33511450 100k08-01 846,246 . . . 856,634 10389 100k08-02 856,515 . . .868,063 11549 100k08-03 867,948 . . . 878,862 10915 100k08-04 878,744 .. . 889,954 11211 100k08-05 889,835 . . . 901,127 11293 100k08-06901,008 . . . 912,978 11971 100k08-07 912,859 . . . 922,812 9954100k08-08 922,693 . . . 933,969 11277 100k08-09 933,850 . . . 939,6935844 100k09-01 939,575 . . . 949,128 9554 100k09-02 949,010 . . .959,384 10375 100k09-03 959,266 . . . 969,156 9891 100k09-04 969,037 . .. 978,088 9052 100k09-05 977,982 . . . 985,362 7381 100k09-06 985,252 .. . 993,763 8512 100k09-07  993,644 . . . 1,002,701 9058 100k09-081,002,582 . . . 1,012,585 10004 100k09-09 1,012,466 . . . 1,022,79210327 100k09-10 1,022,673 . . . 1,032,409 9737 100k09-11 1,032,290 . . .1,041,958 9669 100k09-12 1,041,839 . . . 1,051,279 9441 100k10-011,051,179 . . . 1,059,299 8121 100k10-02 1,059,181 . . . 1,068,249 9069100k10-03 1,068,138 . . . 1,078,645 10508 100k10-04 1,078,526 . . .1,085,635 7110 100k10-05 1,085,516 . . . 1,096,452 10937 100k10-061,096,333 . . . 1,105,535 9203 100k10-07 1,105,418 . . . 1,116,898 11481100k10-08 1,116,780 . . . 1,128,058 11279 100k10-09 1,127,939 . . .1,138,744 10806 100k10-10 1,138,625 . . . 1,146,843 8219 100k11-011,146,759 . . . 1,156,879 10121 100k11-02 1,156,760 . . . 1,167,59310834 100k11-03 1,167,474 . . . 1,179,239 11766 100k11-04 1,179,121 . .. 1,188,001 8881 100k11-05 1,187,883 . . . 1,195,638 7756 100k11-061,195,519 . . . 1,204,931 9413 100k11-07 1,204,812 . . . 1,215,685 10874100k11-08 1,215,566 . . . 1,224,906 9341 100k11-09 1,224,787 . . .1,234,403 9617 100k11-10 1,234,284 . . . 1,241,004 6721 100k12-011,240,898 . . . 1,250,323 9426 100k12-02 1,250,204 . . . 1,259,727 9524100k12-03 1,259,614 . . . 1,270,832 11219 100k12-04 1,270,713 . . .1,279,720 9008 100k12-05 1,279,601 . . . 1,290,366 10766 100k12-061,290,252 . . . 1,300,202 9951 100k12-07 1,300,085 . . . 1,308,976 8892100k12-08 1,308,863 . . . 1,318,474 9612 100k12-09 1,318,355 . . .1,326,702 8348 100k12-10 1,326,583 . . . 1,337,691 11109 100k12-111,337,572 . . . 1,347,802 10231 100k13-01 1,347,689 . . . 1,357,497 9809100k13-02 1,357,378 . . . 1,369,231 11854 100k13-03 1,369,112 . . .1,378,621 9510 100k13-04 1,378,502 . . . 1,387,714 9213 100k13-051,387,595 . . . 1,396,821 9227 100k13-06 1,396,702 . . . 1,407,244 10543100k13-07 1,407,125 . . . 1,417,810 10686 100k13-08 1,417,698 . . .1,428,675 10978 100k13-09 1,428,564 . . . 1,439,655 11092 100k13-101,439,544 . . . 1,451,233 11690 100k13-11 1,451,116 . . . 1,455,004 3889100k14-01 1,454,886 . . . 1,463,884 8999 100k14-02 1,463,770 . . .1,472,031 8262 100k14-03 1,471,918 . . . 1,482,535 10618 100k14-041,482,417 . . . 1,491,781 9365 100k14-05 1,491,664 . . . 1,501,050 9387100k14-06 1,500,931 . . . 1,508,216 7286 100k14-07 1,508,097 . . .1,515,854 7758 100k14-08 1,515,737 . . . 1,526,355 10619 100k14-091,526,250 . . . 1,535,249 9000 100k14-10 1,535,130 . . . 1,543,987 8858100k14-11 1,543,868 . . . 1,552,890 9023 100k14-12 1,552,774 . . .1,564,280 11507 100k15-01 1,564,174 . . . 1,574,973 10800 100k15-021,574,856 . . . 1,586,003 11148 100k15-03 1,585,891 . . . 1,596,79310903 100k15-04 1,596,677 . . . 1,604,287 7611 100k15-05 1,604,170 . . .1,613,369 9200 100k15-06 1,613,258 . . . 1,621,511 8254 100k15-071,621,392 . . . 1,631,869 10478 100k15-08 1,631,750 . . . 1,643,14211393 100k15-09 1,643,023 . . . 1,652,391 9369 100k15-10 1,652,280 . . .1,662,654 10375 100k15-11 1,662,547 . . . 1,667,544 4998 100k16-011,667,429 . . . 1,679,240 11812 100k16-02 1,679,126 . . . 1,690,15311028 100k16-03 1,690,044 . . . 1,700,055 10012 100k16-04 1,699,936 . .. 1,708,018 8083 100k16-05 1,707,899 . . . 1,721,060 13162 100k16-061,720,941 . . . 1,734,097 13157 100k16-07 1,733,974 . . . 1,740,645 6672100k16-08 1,740,525 . . . 1,752,444 11920 100k16-09 1,752,326 . . .1,762,779 10454 100k16-10 1,762,660 . . . 1,771,814 9155 100k16-111,771,695 . . . 1,779,795 8101 100k17-01 1,779,708 . . . 1,790,152 10445100k17-02 1,790,035 . . . 1,799,410 9376 100k17-03 1,799,291 . . .1,809,349 10059 100k17-04 1,809,230 . . . 1,820,280 11051 100k17-051,820,169 . . . 1,830,728 10560 100k17-06 1,830,609 . . . 1,841,56410956 100k17-07 1,841,445 . . . 1,847,824 6380 100k17-08 1,847,705 . . .1,856,025 8321 100k17-09 1,855,909 . . . 1,868,109 12201 100k17-101,867,998 . . . 1,875,399 7402 100k18-01 1,875,300 . . . 1,884,607 9308100k18-02 1,884,488 . . . 1,895,099 10612 100k18-03 1,894,990 . . .1,902,141 7152 100k18-04 1,902,022 . . . 1,912,147 10126 100k18-051,912,028 . . . 1,924,232 12205 100k18-06 1,924,113 . . . 1,935,49111379 100k18-07 1,935,372 . . . 1,948,704 13333 100k18-08 1,948,593 . .. 1,958,709 10117 100k18-09 1,958,599 . . . 1,968,337 9739 100k18-101,968,218 . . . 1,980,692 12475 100k19-01 1,980,585 . . . 1,991,06310479 100k19-02 1,990,945 . . . 2,000,511 9567 100k19-03 2,000,394 . . .2,009,738 9345 100k19-04 2,009,619 . . . 2,021,044 11426 100k19-052,020,925 . . . 2,032,356 11432 100k19-06 2,032,247 . . . 2,042,77810532 100k19-07 2,042,664 . . . 2,051,421 8758 100k19-08 2,051,315 . . .2,060,546 9232 100k19-09 2,060,429 . . . 2,070,495 10067 100k19-102,070,376 . . . 2,080,816 10441 100k19-11 2,080,701 . . . 2,086,225 5525100k20-01 2,086,123 . . . 2,098,560 12438 100k20-02 2,098,441 . . .2,109,119 10679 100k20-03 2,109,000 . . . 2,119,224 10225 100k20-042,119,107 . . . 2,128,815 9709 100k20-05 2,128,696 . . . 2,140,138 11443100k20-06 2,140,019 . . . 2,148,124 8106 100k20-07 2,148,005 . . .2,159,046 11042 100k20-08 2,158,927 . . . 2,168,048 9122 100k20-092,167,929 . . . 2,176,912 8984 100k21-01 2,176,796 . . . 2,187,752 10957100k21-02 2,187,633 . . . 2,199,463 11831 100k21-03 2,199,344 . . .2,209,310 9967 100k21-04 2,209,193 . . . 2,220,948 11756 100k21-052,220,829 . . . 2,231,253 10425 100k21-06 2,231,134 . . . 2,242,69211559 100k21-07 2,242,573 . . . 2,251,251 8679 100k21-08 2,251,132 . . .2,261,427 10296 100k21-09 2,261,308 . . . 2,271,269 9962 100k21-102,271,152 . . . 2,281,408 10257 100k21-11 2,281,307 . . . 2,288,918 7612100k22-01 2,288,816 . . . 2,298,876 10061 100k22-02 2,298,760 . . .2,308,882 10123 100k22-03 2,308,763 . . . 2,319,092 10330 100k22-042,318,973 . . . 2,329,598 10626 100k22-05 2,329,483 . . . 2,340,58311101 100k22-06 2,340,464 . . . 2,351,317 10854 100k22-07 2,351,225 . .. 2,362,005 10781 100k22-08 2,361,906 . . . 2,372,531 10626 100k22-092,372,430 . . . 2,383,456 11027 100k22-10 2,383,337 . . . 2,394,20810872 100k22-11 2,394,089 . . . 2,404,790 10702 100k23-01 2,404,684 . .. 2,415,521 10838 100k23-02 2,415,402 . . . 2,425,882 10481 100k23-032,425,783 . . . 2,436,334 10552 100k23-04 2,436,215 . . . 2,445,909 9695100k23-05 2,445,795 . . . 2,455,395 9601 100k23-06 2,455,304 . . .2,465,797 10494 100k23-07 2,465,678 . . . 2,476,456 10779 100k23-082,476,337 . . . 2,484,906 8570 100k23-09 2,484,787 . . . 2,494,483 9697100k23-10 2,494,384 . . . 2,504,089 9706 100k24-01 2,504,021 . . .2,514,161 10141 100k24-02 2,514,042 . . . 2,522,657 8616 100k24-032,522,558 . . . 2,532,585 10028 100k24-04 2,532,466 . . . 2,542,012 9547100k24-05 2,541,893 . . . 2,551,511 9619 100k24-06 2,551,392 . . .2,560,716 9325 100k24-07 2,560,597 . . . 2,571,096 10500 100k24-082,570,983 . . . 2,582,088 11106 100k24-09 2,581,969 . . . 2,591,097 9129100k24-10 2,590,993 . . . 2,600,564 9572 100k25-01 2,600,470 . . .2,610,521 10052 100k25-02 2,610,402 . . . 2,620,532 10131 100k25-032,620,433 . . . 2,630,974 10542 100k25-04 2,630,855 . . . 2,640,90910055 100k25-05 2,640,790 . . . 2,651,714 10925 100k25-06 2,651,615 . .. 2,663,606 11992 100k25-07 2,663,487 . . . 2,676,074 12588 100k25-082,675,955 . . . 2,684,604 8650 100k25-09 2,684,486 . . . 2,694,189 9704100k25-10 2,694,070 . . . 2,702,813 8744 100k26-01 2,702,720 . . .2,713,409 10690 100k26-02 2,713,290 . . . 2,723,932 10643 100k26-032,723,813 . . . 2,734,707 10895 100k26-04 2,734,609 . . . 2,744,64510037 100k26-05 2,744,565 . . . 2,755,298 10734 100k26-06 2,755,179 . .. 2,763,894 8716 100k26-07 2,763,778 . . . 2,774,027 10250 100k26-082,773,908 . . . 2,784,122 10215 100k26-09 2,784,005 . . . 2,793,207 9203100k26-10 2,793,088 . . . 2,802,862 9775 100k26-11 2,802,743 . . .2,812,001 9259 100k26-12 2,811,882 . . . 2,821,709 9828 100k27-012,821,611 . . . 2,829,258 7648 100k27-02 2,829,139 . . . 2,840,747 11609100k27-03 2,840,629 . . . 2,850,303 9675 100k27-04 2,850,184 . . .2,861,747 11564 100k27-05 2,861,628 . . . 2,874,224 12597 100k27-062,874,125 . . . 2,883,204 9080 100k27-07 2,883,085 . . . 2,892,886 9802100k27-08 2,892,767 . . . 2,903,307 10541 100k27-09 2,903,192 . . .2,912,470 9279 100k27-10 2,912,359 . . . 2,925,141 12783 100k27-112,925,022 . . . 2,934,913 9892 100k27-12 2,934,794 . . . 2,947,632 12839100k28-01 2,947,528 . . . 2,958,629 11102 100k28-02 2,958,510 . . .2,969,760 11251 100k28-03 2,969,641 . . . 2,979,981 10341 100k28-042,979,863 . . . 2,991,128 11266 100k28-05 2,991,016 . . . 3,001,64710632 100k28-06 3,001,530 . . . 3,011,921 10392 100k28-07 3,011,802 . .. 3,017,818 6017 100k28-08 3,017,699 . . . 3,029,508 11810 100k28-093,029,389 . . . 3,040,739 11351 100k28-10 3,040,621 . . . 3,049,609 8989100k28-11 3,049,490 . . . 3,061,680 12191 100k28-12 3,061,561 . . .3,073,892 12332 100k28-13 3,073,773 . . . 3,083,864 10092 100k29-013,083,760 . . . 3,093,964 10205 100k29-02 3,093,855 . . . 3,104,40110547 100k29-03 3,104,282 . . . 3,115,243 10962 100k29-04 3,115,124 . .. 3,126,447 11324 100k29-05 3,126,328 . . . 3,137,036 10709 100k29-063,136,946 . . . 3,146,763 9818 100k29-07 3,146,648 . . . 3,157,292 10645100k29-08 3,157,193 . . . 3,166,872 9680 100k29-09 3,166,754 . . .3,176,818 10065 100k29-10 3,176,729 . . . 3,190,320 13592 100k29-113,190,200 . . . 3,197,411 7212 100k29-12 3,197,292 . . . 3,205,624 8333100k30-01 3,205,520 . . . 3,215,838 10319 100k30-02 3,215,720 . . .3,223,955 8236 100k30-03 3,223,836 . . . 3,232,308 8473 100k30-04 3,232,188 . . . 3,242,559 10372 100k30-05 3,242,448 . . . 3,252,486 10039100k30-06 3,252,362 . . . 3,261,740 9379 100k30-07 3,261,617 . . .3,271,913 10297 100k30-08 3,271,802 . . . 3,282,128 10327 100k30-093,282,026 . . . 3,292,438 10413 100k30-10 3,292,317 . . . 3,301,878 9562100k30-11 3,301,760 . . . 3,308,902 7143 100k30-12 3,308,784 . . .3,319,704 10921 100k31-01 3,319,643 . . . 3,330,096 10454 100k31-023,329,973 . . . 3,339,866 9894 100k31-03 3,339,748 . . . 3,347,473 7726100k31-04 3,347,354 . . . 3,353,926 6573 100k31-05 3,353,798 . . .3,358,503 4706 100k31-06 3,358,382 . . . 3,364,683 6302 100k31-073,364,562 . . . 3,372,812 8251 100k31-08 3,372,694 . . . 3,381,488 8795100k31-09 3,381,367 . . . 3,391,350 9984 100k31-10 3,391,231 . . .3,397,632 6402 100k31-11 3,397,509 . . . 3,405,953 8445 100k31-123,405,834 . . . 3,412,263 6430 100k32-01 3,412,160 . . . 3,425,218 13059100k32-02 3,425,094 . . . 3,436,233 11140 100k32-03 3,436,117 . . .3,447,693 11577 100k32-04 3,447,587 . . . 3,458,871 11285 100k32-053,458,754 . . . 3,473,651 14898 100k32-06 3,473,525 . . . 3,485,08211558 100k32-07 3,484,960 . . . 3,495,175 10216 100k32-08 3,495,050 . .. 3,505,175 10126 100k32-09 3,505,056 . . . 3,511,192 6137 100k32-103,511,087 . . . 3,521,547 10461 100k33-01 3,521,422 . . . 3,532,17510754 100k33-02 3,532,076 . . . 3,542,259 10184 100k33-03 3,542,143 . .. 3,552,029 9887 100k33-04 3,551,907 . . . 3,560,073 8167 100k33-053,559,950 . . . 3,569,315 9366 100k33-06 3,569,198 . . . 3,580,065 10868100k33-07 3,579,946 . . . 3,589,870 9925 100k33-08 3,589,750 . . .3,598,037 8288 100k33-09 3,597,917 . . . 3,608,905 10989 100k33-103,608,783 . . . 3,621,964 13182 100k33-11 3,621,843 . . . 3,631,89210050 100k34-01 3,631,790 . . . 3,639,310 7521 100k34-02 3,639,199 . . .3,648,860 9662 100k34-03 3,648,743 . . . 3,659,291 10549 100k34-043,659,171 . . . 3,667,138 7968 100k34-05 3,667,024 . . . 3,676,694 9671100k34-06 3,676,571 . . . 3,684,078 7508 100k34-07 3,683,940 . . .3,692,892 8953 100k34-08 3,692,772 . . . 3,702,686 9915 100k34-093,702,582 . . . 3,711,607 9026 100k34-10 3,711,488 . . . 3,719,178 7691100k34-11 3,719,064 . . . 3,726,219 7156 100k35-01 3,726,119 . . .3,735,813 9695 100k35-02 3,735,698 . . . 3,745,893 10196 100k35-033,745,767 . . . 3,756,900 11134 100k35-04 3,756,781 . . . 3,767,31510535 100k35-05 3,767,195 . . . 3,776,515 9321 100k35-06 3,776,395 . . .3,786,445 10051 100k35-07 3,786,327 . . . 3,797,101 10775 100k35-083,796,976 . . . 3,806,009 9034 100k35-09 3,805,910 . . . 3,815,399 9490100k35-10 3,815,281 . . . 3,823,285 8005 100k35-11 3,823,166 . . .3,832,032 8867 100k35-12 3,831,909 . . . 3,837,828 5920 100k36-013,837,736 . . . 3,847,670 9935 100k36-02 3,847,551 . . . 3,856,620 9070100k36-03 3,856,499 . . . 3,865,869 9371 100k36-04 3,865,746 . . .3,874,026 8281 100k36-05 3,873,919 . . . 3,880,887 6969 100k36-063,880,768 . . . 3,891,155 10388 100k36-07 3,891,032 . . . 3,899,094 8063100k36-08 3,898,973 . . . 3,909,104 10132 100k36-09 3,908,980 . . .3,916,124 7145 100k36-10 3,916,005 . . . 3,926,250 10246 100k36-113,926,131 . . . 3,933,174 7044 100k36-12 3,933,053 . . . 3,942,133 9081100k36-13 3,942,027 . . . 3,948,320 6294 100k37-01 3,948,216 . . .3,958,890 10675 100k37-02 3,958,767 . . . 3,967,811 9045 100k37-033,967,690 . . . 3,977,596 9907 100k37-04 3,977,471 . . . 9193     10660100k37-05   9077 . . . 15,244 6168 100k37-06 15,125 . . . 22,052 6928100k37-07 21,933 . . . 29,499 7567 100k37-08 29,374 . . . 36,759 7386100k37-09 36,643 . . . 45,184 8542 100k37-10 45,085 . . . 53,037 7953100k37-11 52,911 . . . 61,413 8503 100k37-12 61,285 . . . 70,337 9053100k37-13 70,221 . . . 78,586 8366 100k37-14 78,465 . . . 83,922 5458

TABLE 3 Table of BAC Assemblies Success rate of BAC assembly in yeast,followed by transformation into E. coli and verification by NGS. E. coliYeast Sequence Genotyped verified # of clones BACs # of 10 kb junctions(correct/ (correct/ Section Fragment stretches genotyped total) total) H 1 10 8 4/4 4/4  2 12 7 17/23 5/11  3 13 0 1/1 1/1 A  4 10 11 7/30 2/3 5 10 5 23/24 2/4  6 11 7 6/15 1/4  7 10 2 16/24 1/4  8 9 6 13/15 1/6 B 9 12 5/8 10 10 5 9/22 1/4 11 10 6 8/8 1/4 12 11 12 3/4 1/3 13 11 611/22 6/11 C 14 12 7 12/12 4/4 15 11 7 11/12 4/4 16 11 4/4 17 10 6 9/153/4 18 10 11 7/8 1/7 D 19 11 12 4/24 1/3 20 9 1/3 21 11 12 3/16 3/3 2211 10 3/24 2/3 23 10 11 4/11 2/4 E 24 10 11 11/11 3/4 25 10 10 5/24 1/326 12 11 6/7 4/4 27 12 5 8/24 3/5 28 13 9 4/24 1/4 F 29 12 13 8/24 1/830 12 9 6/22 1/1 31 12 12 7/8 6/8 32 9 9 8/24 1/4 G 33 12 13 6/32 2/4 3411 12 8/24 3/5 35 12 7 5/24 2/3 36 13 14 4/48 1/1 H 37 14 1 0/56 37a 7 710/16 3/3 37b 7 7 1/16 1/1

TABLE 4 Table of REXER experiments Individual or sequential integrationof synthetic fragments into the genome by REXER. The table indicates thesuccess rate of each integration, and details which spacers and markersthat were employed. Individual REXER Markers Spacers 3′ to recoded/ 2ndsynthetic Sect. Frag. total Lin Circ gen DNA on BAC Comments H 1 0/6, xsacB-CmR rpsL **After altering refactoring of ftsI-murE (2/7)** andrecoding of map. 2 1/5 x rpsL-KanR pheS*-HygR 3 1/1 x sacB-CmR rpsL A 41/6 x rpsL-KanR sacB 5 3/6 x sacB-CmR rpsL 6 rpsL-KanR pheS*-HygR 7 3/6x sacB-CmR rpsL 8 rpsL-KanR pheS*-HygR B 9 sacB-CmR rpsL 10 rpsL-KanRpheS*-HygR 11 1/2 x sacB-CmR rpsL 12 2/4 x rpsL-KanR pheS*-HygR 13 2/4 xsacB-CmR rpsL C 14 5/8 x rpsL-KanR sacB 15 sacB-CmR rpsL 16 rpsL-KanRpheS*-HygR 17 sacB-CmR rpsL 18 1/2 x rpsL-KanR sacB D 19 7/9 x sacB-CmRrpsL 20 rpsL-KanR sacB 21 3/5 x sacB-CmR rpsL 22 6/6 x rpsL-KanRpheS*-HygR 23 6/6 x sacB-CmR rpsL E 24 2/7 x rpsL-KanR pheS*-HygR 25 1/3x sacB-CmR rpsL 26 2/3 x rpsL-KanR pheS*-HygR 27 1/8 x sacB-CmR rpsLPoint mutation in non-essential gene 28 2/7 x rpsL-KanR pheS*-HygR Pointmutation in non-essential gene, introducing STOP codon F 29 6/6 xsacB-CmR rpsL 30 rpsL-KanR pheS*-HygR 31 2/5 x sacB-CmR rpsL 32rpsL-KanR pheS*-HygR G 33 4/8 x sacB-CmR rpsL 34 3/5 x rpsL-KanRpheS*-HygR 35 sacB-CmR rpsL 36 rpsL-KanR pheS*-HygR H 37a 0/6, xsacB-CmR rpsL #After recoding of yaaY (1/7)# 37b 3/6 x rpsL-KanRpheS*-AprR Point mutation in non-essential gene Sequential REXER Spacers2nd Markers gen 3′ to recoded/ 2nd REXE synthetic Sect. Frag. total LinCirc gen R4 DNA on BAC Comments H 1 sacB-CmR rpsL 2 2/7 x rpsL-KanRpheS*-HygR 3 3/5 x sacB-CmR rpsL A 4 rpsL-KanR sacB 5 3/6 x sacB-CmRrpsL 6 4/6 x rpsL-KanR pheS*-HygR 7 5/8 x sacB-CmR rpsL 8 3/6 xrpsL-KanR pheS*-HygR B 9 0/29, x sacB-CmR rpsL #After altering recodingof yceQ. (4/5)# 10 1/8 rpsL-KanR pheS*-HygR 11 2/6 x sacB-CmR rpsL 121/6 — — — — rpsL-KanR pheS*-HygR Conjugated into 4-11 from individual100k12 strain 13 7/8 x sacB-CmR rpsL C 14 rpsL-KanR sacB 15 3/5 xsacB-CmR rpsL 16 4/9 rpsL-KanR pheS*-HygR 17 4/8 x sacB-CmR rpsL 18 5/10 x rpsL-KanR sacB D 19 sacB-CmR rpsL 20 3/4 x rpsL-KanR sacB 3rdgen REXER4 21 1/7 x sacB-CmR rpsL Required Streptomycin at 4000 ug/mL 226/6 x rpsL-KanR pheS*-HygR 23 4/6 x sacB-CmR rpsL E 24 rpsL-KanRpheS*-HygR 25 2/6 x sacB-CmR rpsL 26 4/6 x rpsL-KanR pheS*-HygR 27 3/6 xsacB-CmR rpsL 28 3/8 x rpsL-KanR pheS*-HygR F 29 sacB-CmR rpsL 30 2/3 xrpsL-KanR pheS*-HygR 31  2/10 x sacB-CmR rpsL 32 4/4 x rpsL-KanRpheS*-HygR G 33 sacB-CmR rpsL 34 1/8 x rpsL-KanR pheS*-HygR 35 6/6 xsacB-CmR rpsL 36 3/7 x rpsL-KanR pheS*-HygR H 37a sacB-CmR rpsL 37b 3/5x rpsL-KanR pheS*-AprR

Example 3—Identifying and Repairing Design Flaws

Sequencing several clones following REXER allows us to score thefrequency with which each target codon is recoded and thereby compile arecoding landscape for the genomic region. From the recoding landscapewith fragment 1 we directly identified the fourth codon (Ser4, TCA) inmap, an essential gene encoding methionine amino peptidase, asrecalcitrant to recoding by our defined scheme (FIG. 5A). We alsoidentified a second region, which encompasses a 14 bp overlap of theessential genes ftsI and murE, and several serine codons in ftsI andmurE, which was not replaced by our recoded and refactored sequence.Since we have previously recoded this region with the same recodingscheme, when duplicating the overlap plus 182 bp rather than the 20 bpused here (Wang, K. et al., 2016. Nature 539, 59-64) (FIG. 1C), weconclude that the defect in the synthetic DNA for this region is in itsrefactoring rather than in its recoding. REXER with a new fragment 1BAC, which contained both the extended refactoring (FIG. 5B) and a TCAto TCT mutation at Ser4 in map (FIG. 5C, Table 5) enabled completerecoding of the targeted 100 kb region of the genome (FIG. 5D).

From the post-REXER recoding landscape for fragment 9 we identified a 26kb genomic region that was never recoded (FIGS. 6A-6D). Efforts todelete 10 kb regions of the genome within and around this region, in thepresence of a BAC containing recoded fragment 9, narrowed down theregion that was challenging to recode to 10 kb of the genome. REXERacross the 10 kb genomic-region revealed a minimum within the resultingrecoding landscape at yceQ. This identified the five target codonswithin yceQ as problematic to recode. Similarly, the recoding landscapefollowing REXER with fragment 37a, followed by further sequencingallowed us to identify a single codon at the 3′ end of yaaY, which wasnever recoded (FIGS. 7A-7D).

yceQ and yaaY both encode ‘predicted proteins’, multiple insertions inyceQ are viable, and there is no evidence of mRNA production and/orprotein synthesis from these predicted genes (Pundir, S., et al., 2017.Methods Mol Biol 1558, 41-55). Notably, the codons that are recalcitrantto recoding within yceQ and yaaY all lie within the 5′ untranslatedregions (UTRs) of essential genes. We suggest that the sequence changesintroduced by recoding yceQ and yaaY negatively affect the regulation ofthe adjacent essential genes. Indeed, the target codons in yceQ map toRNA secondary structures and promoter elements within the 5′UTR of me(encoding the essential ribonuclease RNase E) (FIGS. 8A and 8B) andthese sequences are essential for controlling RNAse E homeostasis(Schuck, A., et al. 2009. Mol Microbiol 72, 470-478).

We fixed fragment 9 by introducing a stop codon into the 5′ sequence ofyceQ; this minimizes any potential translation but retains the nativesequence for regulating me transcription (FIGS. 6A-6D, Table 5). REXERwith this new BAC, led to complete recoding of the corresponding 100 kbgenomic-region (FIGS. 6A-6D, Table 5). REXER with a new BAC, containingfragment 37a with a TCA to AGC substitution at the problematic codon inyaaY, led to complete recoding of the corresponding region of the genome(FIGS. 7A-7D, Table 5).

Having pinpointed and fixed all the initially problematic sequences wecompleted the assembly of a strain in which sections A and B are fullyrecoded (FIGS. 9A and 9B), and the assembly of a strain in which sectionH is entirely recoded (Table 5, FIGS. 9A and 9B). This completed theassembly of all the sections in seven distinct strains.

TABLE 5 Alternative recoding mutagenesis oligosTable of oligonucleotides used for site directed mutagenesisapproaches to identify alternative viable recoding solutions. TargetOligo F Oligo R Frag. gene Purpose (5′ -> 3′) (5′ -> 3′) Template 1ftsI-murE Integrate aaaatgaatttgtgattaat gcgtctggcacccacggagc pheS*-HygRpheS*- caaggcgaggggacaggtgg aagaaggtcgcgcaaattac HygRcagaagctaaAAGCTTGAGC gatctgccacTTATTCCTTT ACGTGTTGACAATTAATCATGCCCTCGGACGAGTGCTGG CGG (SEQ ID NO: 328) (SEQ ID NO: 327) 1 ftsI-murESer4 AGT Aaaaaggtcgggccggacgg gcaataatggcagccacacc Synthetic tc ttgDNA Ser r.s.3 (SEQ ID NO: 329) (SEQ ID NO: 330) (Wang et al.,Nature 2016) 1 map Integrate gcgacgcgcattttttcgat ggcacttacatatatattgtpheS*-HygR pheS*- atcttctggggtcttgatTG cggtatcaccgacgctgatg HygRAgatagccatGATTGTCCTC gacagaattaAAGCTTGAGC CTTATTCCTTTGCCCTCGGAACGTGTTGACAATTAATCAT CGAGTGCTGG CGG (SEQ ID NO: 331) (SEQ ID NO: 332) 1map Ser4 AGT cacttcggcagccagtcggc taacgggtctggtgaccgaa MDS42 wtcagcgacgcgcattttttcg gtgaac atatcttctggggtcttgat (SEQ ID NO: 334)ACTgatagccattaattctg tccatcagcgtcggtgatac cgac (SEQ ID NO: 333) 1 mapSer4 AGC cacttcggcagccagtcggc taacgggtctggtgaccgaa MDS42 wtcagcgacgcgcattttttcg gtgaac atatcttctggggtcttgat (SEQ ID NO: 336)GCTgatagccattaattctg tccatcagcgtcggtgatac cgac (SEQ ID NO: 335) 1 mapSer4 TCT cacttcggcagccagtcggc taacgggtctggtgaccgaa MDS42 wtcagcgacgcgcattttttcg gtgaac atatcttctggggtcttgat (SEQ ID NO: 338)AGAgatagccattaattctg tccatcagcgtcggtgatac cgac (SEQ ID NO: 337) 1 mapSer4 TCC cacttcggcagccagtcggc taacgggtctggtgaccgaa MDS42 wtcagcgacgcgcattttttcg gtgaac atatcttctggggtcttgat (SEQ ID NO: 340)GGAgatagccattaattctg tccatcagcgtcggtgatac cgac (SEQ ID NO: 339) 1 mapSer4 ACA cacttcggcagccagtcggc taacgggtctggtgaccgaa MDS42 wtcagcgacgcgcattttttcg gtgaac atatcttctggggtcttgat (SEQ ID NO: 342)TGTgatagccattaattctg tccatcagcgtcggtgatac cgac (SEQ ID NO: 341) 1 mapSer4 TTA cacttcggcagccagtcggc taacgggtctggtgaccgaa MDS42 wtcagcgacgcgcattttttcg gtgaac atatcttctggggtcttgat (SEQ ID NO: 344)TAAgatagccattaattctg tccatcagcgtcggtgatac cgac (SEQ ID NO: 343) 9 yceQIntegrate gtcgcgtcgccaacctcacg tgataaatggtaaaagtcat pheS*-HygR pheS*-gttatcgtcagctcaaagag cttgctataacaaggcttgc HygR gcgcagagtgAAGCTTGAGCagtggaataaTTATTCCTTT ACGTGTTGACAATTAATCAT GCCCTCGGACGAGTGCTGG CGG(SEQ ID NO: 346) (SEQ ID NO: 345) 9 yceQ Ser2 TGA, ctcgtgtctagtcgcgtcgccaccagcaagaagtgaaaaa MDS42 wt Ser7+15+5 caacctcacggttatcgtcaactgtgagtaagc 7κWT gctcaaagaggcgcagagtg (SEQ ID NO: 348)TGAgttgcccgttttTCAtg cggaaaaacagcgcaattaT CAaaga (SEQ ID NO: 347) 37ayaaY Integrate tgattagcgtactcaatcgc agattatgtatgccgcgtat pheS*-HygRpheS*- cggttaaccttgaccgctgt cagcttcatgtctggctcaa HygRacaaggtataAAGCTTGAGC aacagTGAaaatcgtccgag ACGTGTTGACAATTAATCATTTATTCCTTTGCCCTCGGAC CGG GAGTGCTGG (SEQ ID NO: 349) (SEQ ID NO: 350) 37ayaaY Ser70 AGC aaaggtgaagacaaagccgc ggctgagattatgtatgccg Partiallytatcgaag cgtatcagcttcatgtctgg recoded (SEQ ID NO: 351)ctcaaaacagGCTaaatcgt clone ccgagtataccttgtacagc ggtcaaggttaac(SEQ ID NO: 352) 37a yaaY Ser70 AGT aaaggtgaagacaaagccgcggctgagattatgtatgccg Partially tatcgaag cgtatcagcttcatgtctgg recoded(SEQ ID NO: 353) ctcaaaacagACTaaatcgt clone ccgagtataccttgtacagcggtcaaggttaac (SEQ ID NO: 354) 37a yaaY Ser70 TCC aaaggtgaagacaaagccgcggctgagattatgtatgccg Partially tatcgaag cgtatcagcttcatgtctgg recoded(SEQ ID NO: 355) ctcaaaacagGGAaaatcgt clone ccgagtataccttgtacagcggtcaaggttaac (SEQ ID NO: 356) 37a yaaY Ser70 TCG aaaggtgaagacaaagccgcggctgagattatgtatgccg Partially tatcgaag cgtatcagcttcatgtctgg recoded(SEQ ID NO: 357) ctcaaaacagCGAaaatcgt clone ccgagtataccttgtacagcggtcaaggttaac (SEQ ID NO: 358) 37a yaaY Ser70 TCT aaaggtgaagacaaagccgcggctgagattatgtatgccg Partially tatcgaag cgtatcagcttcatgtctgg recoded(SEQ ID NO: 359) ctcaaaacagAGAaaatcgt clone ccgagtataccttgtacagcggtcaaggttaac (SEQ ID NO: 360)

Example 4—Assembly of a Recoded Genome

We developed a conjugation-based strategy (Isaacs, F. J. et al., 2011.Science 333, 348-353; Ma, N. J., et al., 2014. Nat Protoc 9, 2285-2300;and Lederberg, J. & Tatum, E. L., 1946. Nature 158, 558) to assemble therecoded sections into a single genome. Our strategy assembles therecoded genome in a clockwise manner by conjugating recoded ‘donor’sections, containing the origin of transfer (oriT), into their adjacentrecoded ‘recipient’ sections, that have been extended to providehomology to the donor (FIG. 10 , FIG. 11A, FIGS. 22A and 22B). Thisgenerates a new genome that contains the recoded sections of both thedonor and the recipient. The cells containing this new genome can thenbe used as a recipient for the next recoded donor, and iteration of theprocess enables the recoded genome to be assembled through the additionof recoded sections to an increasingly recoded recipient (FIG. 10 ,FIGS. 11A and 11B). Donor cells contained a version of the F′ plasmidthat facilitates transfer of the donor genome to the recipient cellsbut, unlike standard F′ plasmids, is not competent to transfer itself torecipient cells (FIG. 22C); as a result this F′ plasmid does not have tobe lost from the recipient cells after every conjugation. Thisaccelerated our workflow.

We initiated conjugation by mixing donor and recipient cells, and variedthe time and conditions of conjugation to control the extent of genometransfer from the donor to the recipient. Following conjugation betweenthe donor and the recipient cells, we selected for recipient cells, andthen for those recipients that had gained the positive marker at the endof the recoded sequence from the donor, and lost the negative marker atthe end of the extension in the recipient (FIG. 11A).

We performed a convergent synthesis of a genome recoded through sectionsA-E (FIG. 10 , FIG. 11B). We then used the A-E strain as a recipient forF, generating a recoded strain, A-F. A-F was then used as a recipientfor F-G, generating A-G; this conjugation used a much longer sharedrecoded sequence (0.4 Mb) between the donor and recipient strains toincrease conjugation efficiency.

To create a completely recoded genome we first created a recipientstrain by introducing 37a and 37b into A-G to create A-G-37ab (providinga 115 kb homology region with the final donor). We created the finaldonor strain by conjugation between strain H and strain AB, whichyielded strain H-A-09, in which H, A and fragment 9 from section B arerecoded (FIG. 10 , FIG. 11B). The additional sequence from A and B wasadded to H to ensure that we did not erase the recoding in A in thefinal conjugation. The final conjugation between the H-A-09 donor strainand A-G-37ab recipient strain led to the synthesis of E. coli, which wename E. coli Syn61, in which all 1.8×10⁴ target codons in the genome arerecoded (FIG. 19 , SEQ ID NO: 2). The synthesis of our recoded genomeintroduced only eight non-programmed mutations (Table 6); four of thesemutations arose during the preparation of the 100 kb BACs, and fourduring the recoding process.

TABLE 6 Differences between initial design and Syn61 sequenceTable of design optimizations and non-programmed mutations.At 7 target codons we could not implement our definedrecoding scheme. For the final genome we found viablealternative codons that were in accordance with ourrecoding scheme and a refactoring solution for a problematicrecoding area in fragment 1. Additionally, we assembled 8single nucleotide mutations in the final genome, whicharose either in preparation of the 100 kb BACs or during recoding.Section Fragment Position* Original design Final genome ConsequenceOrigin Design optimisations H 37a 16,213 AGT AGC Viable recoding of S70in yaaY H 1 88,037 1 nt + TAA + 4 nt + TGA + Viable 20 nt 182 ntseparation of ftsI and murE H 1 178,509 AGT TCT Viable recoding of S4 inmap B 9 976,671 AGC TGA Disruption of pseudogene yceQ to 976,686 AGT TCApreserve viable expression of rne 976,710 AGT TCA 976,836 AGC TCG976,899 AGT TCA Non-programmed mutations H 37b 53,145 G A Intergenic Inregion DNA synthesis or BAC assembly C 15 1,579,495 C T D434D In in DNAsdaA synthesis (non-essential or gene) BAC assembly D 21 2,288,863 T —Deletion During in recoding yfiL (non-essential gene) E 27 2,885,875 A GT369A In in transfer acrF from (non-essential DH10b gene) to MDS42 E 283,031,081 C A S1191 In in DNA gntK synthesis (non-essential or gene) BACassembly F 30 3,252,858 T C S10S During in recoding gmK (essential gene)F 30 3,252,920 A G Y31C During in recoding gmK (essential gene) F 303,319,703 A G Intergenic During region recoding *Position in designedgenome (Supplementary data 2).

Example 5—Consequences of Synonymous Codon Compression in Syn61

Syn61 doubled only 1.6 times slower than MDS42 in LB plus glucose at 37°C., and this ratio increased at 25° C., and decreased at 42° C. (FIG.13A). Syn61 contains 65% more AGT and AGC codons than MDS42, butproviding additional copies of serV, the tRNA that decodes these codons(FIG. 12A), did not increase growth (FIG. 13A); this suggests serV isnot limiting. Imaging Syn61 cells suggests they are slightly longer thanMDS42 (FIGS. 13B and 13C). The proteome of Syn61 was comparable to thatof MDS42 (FIG. 13D). Co-translational incorporation of a non-canonicalamino acid, using an orthogonal aminoacyl-tRNA synthetase/tRNA_(CGA)pair, targeted to TCG codons was extremely toxic in MDS42, butcompletely non-toxic in Syn61; providing phenotypic validation for theremoval of TCG codons in Syn61 (FIG. 12B). This approach also providedadditional insights (FIGS. 14A-14C). serT, encoding tRNA^(Ser) _(UGA),is the only tRNA that decodes TCA codons in E. coli, and is thereforeessential. Since Syn61 does not contain TCA codons serT should bedispensable in our strain. Indeed we demonstrated that we could easilyremove serT (FIG. 12C, FIG. 14D, FIG. 23 ), as well as serU and prfA, inSyn61 (FIGS. 14E and 14F, FIG. 23 ). These data provide functionalconfirmation that we have removed the target codons from the genome,show that the tRNAs and release factor that decode the target codons canbe removed in Syn61, and demonstrate unique properties of Syn61 thatarise from recoding.

Example 6—Discussion

We have created E. coli in which we have replaced the entire 4 Mb genomewith synthetic DNA; the scale of genomic replacement in our experimentsis approximately 4 times larger than previously reported for genomereplacement in mycoplasma or chromosome replacement in a single strainof S. cerevisiae (FIG. 15A).

We have demonstrated the genome-wide removal of all known, 1.8×10⁴,target codons (two sense codons, TCG and TCA, the amber codon, TAG) in asingle strain of E. coli. Our work removes 60 times more codons thanexperiments removing the amber stop codons by site-directed mutagenesis(FIG. 15B). Moreover, it demonstrates complete, and genome-wide,recoding of all targeted sense codons (FIG. 15B). Thus, we have createda synthetic organism that uses 61 codons instead of the normal 64. Thenew organism uses a reduced number of sense codons to encode the 20canonical amino acids.

Our synthetic genome contains only 2×10⁻⁴ non-programmed mutations pertarget codon (FIG. 15C). This compares favorably to 1.05 non-programmedmutations per target codon reported for replacing the amber codons bysite-directed mutagenesis methods (Lajoie, M. J. et al., 2013. Science342, 357-360) (FIG. 15C).

Our final synthetic genome was recoded using defined refactoring andrecoding schemes; using a recoding rule we previously determined on just83 (0.43%) of the target codons in the genome (Wang, K. et al. 2016.Nature 539, 59-64). The recoding rule worked at 99.9% of the 1.8×10⁴target codons in the genome, while the refactoring rules worked at 99%of overlaps.

Corrections to our initial recoding scheme were necessary at just sevenof the 1.8×10⁴ target codons in the whole genome. While one of thesecodons was in an essential gene the other six were within the 5′ UTRs ofessential genes. Thus, all but one of the changes to our definedrecoding scheme correct for unintended alterations to the 5′ UTRs ofessential genes, rather than for direct effects of altered synonyms ontranslation.

The strategies we have developed for disconnecting a designed genomeinto sections, fragments, and stretches, and realizing the designthrough the convergent, seamless and robust integration of REXER,GENESIS and directed conjugation, provides a blueprint for future genomesyntheses. In future work we will further characterize the consequencesof synonymous codon compression in E. coli Syn61, and test additionalrecoding schemes in E. coli and other organisms. In addition we willtest sense codon reassignment for non-canonical biopolymer synthesis.

Example 7—Methods

Recoded Genome Design

We based our synthetic genome design on the sequence of the E. coliMDS42 genome (accession number AP012306.1, released 7 Oct. 2016), whichhas 3547 annotated CDS. We manually curated the starting genomeannotation to remove three CDS and add another twelve. The threepredicted CDS removed were htgA, ybbV, and yzfA; there is no evidencethat these sequences encode proteins (Pundir, S., et al., 2017. MethodsMol Biol 1558, 41-55), and these sequences completely or largely overlapwith better characterised genes, which would make it difficult to recodethem without disrupting their overlapping genes or creating largerepetitive regions. Conversely, the pseudogenes ydeU, ygaY, pbl, yghX,yghY, agaW, yhiK, yhjQ, rph, ysdC, glvG, and cybC were promoted to CDS.To enable negative selection with rpsL, we mutated the genomic copy ofrpsL to rpsL^(K43R). Finally, deep sequencing of our in-house MDS42revealed a 51 bp insertion between mrcB and hemL which had not beenreported in AP012306.1. We manually introduced and annotated thisinsertion in our starting genome sequence.

We produced a custom Python script that i) identifies and recodes alltarget codons, and ii) identifies and resolves overlapping genesequences that contain target codons. From our curated MDS42 startingsequence, we used the script to generate a new synthetic genome in whichall TCG, TCA and TAG codons were replaced with AGC, AGT and TAArespectively. The script reported 91 CDS with overlaps containing targetcodons. In 33 instances, genes were overlapping tail-to-tail (3′, 3′)(Table 1); 12 of these could be recoded by introducing a silent mutationin the overlapping gene, while the remaining 21 were duplicated toseparate the genes (FIG. 1B). 58 instances of genes overlappinghead-to-tail (5′, 3′) were resolved by duplicating the overlap plus 20bp of upstream sequence to allow endogenous expression of the downstreamgene (FIG. 1C). For overlaps longer than 1 bp, an in-frame TAA wasintroduced to terminate expression from the original RBS for thedownstream gene. prfB (release-factor RF-2) was not annotated as a CDSin our starting MDS42 genome due to its regulatory internal stop codon,and we therefore recoded all the target codons in the gene manually,thereby maintaining the internal stop codon. The resulting genome designcontained 3556 CDS with 1,156,625 codons of which 18,218 were recoded(FIG. 18 , SEQ ID NO: 1).

Retrosynthesis of Recoded Stretches

We divided the designed genome into 37 fragments of between 91 and 136kb. We chose the boundary sequences that delimit these fragments sothat: i) they consist of a 5′-NGG-3′ PAM to allow REXER4 to be used forintegration if necessary, ii) the PAM does not sit within 50 bp of atarget codon, iii) the PAM is in-between non-essential genes and iv) thePAM does not disturb any annotated features such as promoters. We calledthe regions ˜50-100 bp upstream and downstream of these boundaries‘landing sites’, and these are annotated as Lxx, where xx is the numberof the upstream fragment, e.g. L01 is the landing site between fragment1 and 2. In our design, a landing site sequence is contained in the 3′end of a fragment and the 5′ end of the next—as a result all 37fragments contain overlapping homologies of 54-155 bp with theirneighbouring fragment.

Each fragment was further broken down to 7-14 stretches of 4-15 kb. Wedesigned the stretches so that they contain overlaps of 80-200 bp witheach other, and the overlap regions were defined at intergenic regionsfree of any recoding targets. A total of 409 stretches were synthesised(GENEWIZ, USA) and supplied in pSC101 or pST vectors flanked by BsaI,AvrII, SpeI, or XbaI restriction sites. The synthetic stretchesnaturally did not contain at least one of these restriction sites.

Construction of Selection Cassettes and Plasmids for REXER/GENESIS

The cloning procedures described in this section were performed in E.coli DH10b, which is resistant to streptomycin by virtue of an rpsLK43Rmutation. The plasmid pKW20_CDFtet_pAraRedCas9_tracrRNA used throughoutthis study encodes Cas9 and the lambda-red recombination componentsalpha/beta/gamma under the control of an arabinose-inducible promoter,as well as a tracrRNA under its native promoter, as previously described(Wang, K. et al., 2016. Nature 539, 59-64).

The protospacers for REXER are encoded in the plasmidpKW1_MB1_(Amp)_Spacer (FIG. 21A), which contains a pMB1 origin ofreplication, an ampicillin resistance marker and the protospacer arrayunder the control of its endogenous promoter as previously described(Wang, K. et al., 2016. Nature 539, 59-64). From this plasmid weconstructed the derivative pKW3_MB1_(Amp)_Tracr^(K)_Spacer (Table 5),which additionally contains a tracrRNA upstream of the protospacerarray. For this we introduced a PCR product containing tracrRNA with itsmodified endogenous promoter into the BamHI site ofpKW1_MB1_(Amp)_Spacer via Gibson assembly using the NEBuilder HiFiMaster Mix. From this plasmid a derivative that additionally encodesCas9 was constructed, also by Gibson assembly, and namedpKW5_MB1_(Amp)_Tracr^(K)_Cas9_Spacer.

For each REXER step, a derivative of one of these three plasmids wasconstructed to harbour a protospacer/direct repeat array containing 2(REXER2) or 4 (REXER4) protospacers, corresponding to the targetsequences for cutting the BAC and genome. The different protospacerarrays were constructed from overlapping oligos through multiple roundsof PCR—the products were inserted by Gibson assembly between restrictionsites Accl and EcoRI in the backbone of pKW1_MB1_(Amp)_Spacer,pKW3_MB1_(Amp)_Tracr^(K)_Spacer or pKW5_MB1_(Amp)_Tracr^(K)_Cas9_Spacer.The protospacer arrays resulting from each assembly were verified to bemutation-free by Sanger sequencing.

The positive-negative selection cassettes used in REXER and GENESIS are−1/+1 (rpsL-Kan^(R)), −2/+2 (sacB-Cm^(R)) and −3/+3(pheS^(T251A_A294G)-Hyg^(R)). −1/+1 and −2/+2 are as previouslydescribed (Wang, K. et al., 2016. Nature 539, 59-64). In −3/+3,pheS^(T251A_A294)G is dominant lethal in the presence of4-chlorophenylalanine, and Hyg^(R) confers resistance to hygromycin.Both proteins are expressed polycistronically under control of the EM7promoter. The −3/+3 cassette was synthesised de novo. The −3/+3 cassetteis also referred to as pheS*/Hyg^(R).

Construction of E. coli Strains Containing Double Selection Cassettes atGenomic Landing Sites.

According to our design, each region of the genome that is targeted forreplacement by a synthetic fragment is flanked by an upstream landingsite and a downstream landing site; these genomic landing site sequencesare the same as the landing site sequences described above. Initiationof REXER/GENESIS requires the insertion of a double selection cassettein the upstream genomic landing site. We inserted double selectioncassettes at the landing sites through lambda-red mediatedrecombination. Briefly, either the sacB-Cm^(R) or the rpsL-Kan^(R)cassettes were PCR amplified with primers containing homology regions tothe genomic landing sites of interest. For recombination experiments, weprepared electrocompetent cells as described previously (Wang, K. etal., 2016. Nature 539, 59-64) and electroporated 3 μg of the purifiedPCR product into 100 μL of MDS42^(rpLK43R) cells harbouring thepKW20_CDFtet_pAraRedCas9_tracrRNA plasmid expressing the lambda-redalpha/beta/gamma genes. The recombination machinery was induced, undercontrol of the arabinose promoter (pAra), with L-arabinose added at 0.5%for 1 hour starting at OD₆₀₀=0.2. Pre-induced cells were electroporatedand then recovered for 1 hour at 37° C. in 4 mL of super optimal broth(SOB) medium. Cells were then diluted into 100 mL of LB medium with 10μg/mL tetracycline and grown for 4 hours at 37° C., 200 rpm. The cellswere subsequently spun down, resuspended in 4 mL of H₂O, seriallydiluted, plated and incubated overnight at 37° C. on LB agar platescontaining 10 μg/mL tetracycline, 18 μg/mL chloramphenicol (forsacB-Cm^(R)) or 50 μg/mL kanamycin (for rpsL-Kan^(R)).

BAC Assembly and Delivery

We constructed Bacterial Artificial Chromosomes (BACs) shuttle vectorsthat contained 97-136 kb of synthetic DNA. On the 5′ side, the syntheticDNA was flanked by a region of homology to the genome (HR1), and a Cas9cut site. On the 3′ side the synthetic DNA was flanked by a doubleselection cassette, a region of homology to the genome (HR2), and asecond Cas9 cut site. The BAC also contained a negative selectionmarker, a BAC origin, a URA marker and YAC origin (CEN6 centromere fusedto an autonomously replicating sequence (CEN/ARS)) (FIG. 2C, FIGS.20A-20C).

BACs were assembled by homologous recombination in S. cerevisiae. Eachassembly combined i) 7-14 stretches of synthetic DNA, each 6-13 kb inlength, with ii) a selection construct (see below) and iii) a BACshuttle vector backbone (FIGS. 20A-20C, Wang, K. et al., 2016. Nature539, 59-64).

Synthetic DNA stretches were excised by digestion with BsaI, AvrII,SpeI, or XbaI restriction sites from their source vectors provided byGENEWIZ. In the case of AvrII, SpeI, and XbaI, restriction digests werefollowed by Mung Bean nuclease treatment to remove sticky ends.

Selection constructs contained a region of homology to the 3′ moststretch of the fragment, a double selection cassette (sacB-Cm^(R) orrpsL-Kan^(R)) a region of homology (HR2) to the targeted genomic locus,a negative selection marker (rpsL, sacB or pheS*-Hyg^(R)) and YAC. Forspecific double selection cassettes, negative selection markers, andhomology region sequences see FIG. 20D. We assembled episomal versionsof the selection constructs in a pSC101 backbone from 3 PCR fragmentswith NEBuilder HiFi DNA Assembly Master Mix.

The episomal versions were designed so that restriction digestion withBsaI yielded a DNA fragment for BAC assembly.

The BAC backbone containing a BAC origin and a URA3 marker was amplifiedby PCR using a previously described BAC (Wang, K. et al., 2016. Nature539, 59-64) as a template, and the PCR product used for BAC assembly.The primers used for these PCR assemblies are listed in FIG. 20D.

To assemble the stretches, selection construct, and BAC backbone, 30-50fmol of each piece of DNA was transformed into S. cerevisiaespheroplasts; these were prepared as previously described (Kouprina, N.,et al., 2004. Methods Mol Biol 255, 69-89). Following assembly weidentified yeast clones potentially harbouring correctly assembled BACsby colony PCR at the junctions of overlapping fragments andvector-insert junctions. Clones that appeared correct by colony PCR weresequence verified by NGS after transformation into E. coli, as describedbelow.

The assembled BACs were extracted from yeast with the Gentra PuregeneYeast/Bact. Kit (Qiagen) following the manufacturer's instructions.MDS42^(rpsLK43R) cells were transformed with the assembled BAC byelectroporation. Due to the large size of the BACs we sometimes observedinefficient electroporation into target cells. Consequently, weintroduced an oriT-Apramycin cassette provided as a PCR product with 50bp homology regions by lambda-red-mediated recombination (as describedabove) into some BACs post assembly (FIGS. 20A-20C). This facilitatedtransfer of BACs, from E. coli that had been successfully transformed,to other strains by conjugation.

Synthesis of Recoded Sections by REXER and GENESIS

We used various genomic and plasmid selection markers for sequentialREXER experiments (GENESIS) (Table 4). We used an rpsL-Kan^(R)(−1/+1) orsacB-Cm^(R) (−2/+2) cassette at genomic landing sites for selection. Weused rpsL-Kan^(R)-sacB (−1/+1,−2), rpsL-Kan^(R)-pheS*-Hyg^(R)(−1/+1,−3/+3) or sacB-Cm^(R)-rpsL (−2/+2,−1) cassettes as episomalselection markers.

For each REXER, MDS42^(rpsLK43R) cells containingpKW20_CDFtet_pAraRedCas9_tracrRNA and a double selection cassette at therelevant upstream genomic landing site were transformed with therelevant BAC. We plated cells on LB agar supplemented with 2% glucose, 5μg/ml tetracycline and antibiotic selecting for the BAC (i.e. 18 μg/mlchloramphenicol or 50 μg/ml kanamycin). We inoculated individualcolonies into LB medium with 5 μg/ml tetracycline and the BAC specificantibiotic and grew cells overnight at 37° C., 200 rpm. The overnightculture was diluted in LB medium with 5 μg/ml tetracycline, and the BACspecific antibiotic, to OD600=0.05 and grown at 37° C. with shaking forabout 2 h, until OD600≈0.2. To induce lambda-red expression we addedarabinose powder to the culture to a final concentration of 0.5% and theincubated the culture for one additional hour at 37° C. with shaking. Weharvested the cells at OD600=0.6, and made the cells electro-competentas described previously (Wang, K. et al., 2016. Nature 539, 59-64).

For each REXER experiment a linear dsDNA protospacer array was PCRamplified from pKW1_MB1Amp_Spacers using universal primers (FIG. 21A).Approximately 5-10 μg of the resulting Dpnl digested and purified PCRproduct was transformed into 100 μL electro-competent and induced cells.Cells were recovered in 4 ml SOB medium for 1 h at 37° C. and thendiluted to 100 mL LB supplemented with 5 μg/mL tetracycline andantibiotic selecting for the BAC and incubated for another 4 h at 37° C.with shaking. Alternatively, electrocompetent and induced cells weretransformed with 5 μg of circular protospacer array (pKW1_MB1Amp_Spacersor pKW3_MB1Amp_Spacers plasmid) and after 1 h recovery in SOB medium at37° C. transferred into 100 mL LB supplemented with 100 μg/mL ampicillinfor another 4 h at 37° C. with shaking (FIGS. 21A and 21B). If REXER2was not sufficient we performed REXER4 using pKW5_MB1Amp_Spacers plasmidas previously described (Wang, K. et al., 2016. Nature 539, 59-64).

We spun down the culture and resuspended it in 4 ml Milli-Q filteredwater and spread in serial dilutions on selection plates of LB agar with5 μg/ml tetracycline, an agent selecting against the negative selectionmarker and an antibiotic selecting for the positive marker originatingfrom the BAC. The plates were incubated at 37° C. overnight. Multiplecolonies were picked, resuspended in Milli-Q filtered water, and arrayedon several LB agar plates supplemented with 50 μg/ml kanamycin, 18 μg/mlchloramphenicol, 200 μg/ml streptomycin, 7.5% sucrose or 2.5 mM4-chloro-phenylalanine. Colony PCR was also performed from resuspendedcolonies using both a primer pair flanking the genomic locus of thelanding site and the position of the newly integrated selection cassettefrom the BAC. REXER-mediated recombination results in an approximately500 bp band at the upstream genomic locus with a 2.5 kb (rK-landingsite) or 3.5 kb (sC-landing site) band for the controlMDS42^(rk)/MDS42^(sC) strain indicating successful removal of thelanding site from the genome. Primer pairs flanking the 3′ end of thereplaced DNA generate an approximately 2.5 kb (rK selection cassette onpBAC) or 3.5 kb (sC selection cassette on pBAC) band and a 500 bp bandfor the control MDS42^(rk)/MDS42^(sC) strain indicating successfulintegration of the selection markers.

If a plasmid based circular protospacer array was used in the previousREXER experiment the plasmid had to be lost before the next experiment.Thus, a successful clone from the first REXER experiment was grown in LBsupplemented with 2% glucose, 5 μg/mL tetracycline and antibioticselecting for the positive marker in the genome to a dense culture at37° C. with shaking. 2 μL of the culture were then streaked out on an LBagar plate with the same supplements and incubated at 37° C. overnight.Several colonies were arrayed in replica on LB agar plate and LB agarplate supplemented with 100 μg/mL ampicillin to screen for the loss ofthe plasmid.

BAC Editing

When encountering loss-of-function mutations in a selection cassette onBACs in E. coli, the faulty cassette was replaced with a suitable doubleselection cassette provided (FIG. 20D) as a PCR-product flanked by 50 bphomology regions and integrated by lambda-red-mediated recombination.

Changes in the synthetic, recoded sequence of a BAC, either to correctspontaneous mutations or change recoded codons, were introduced by atwo-step replacement approach; For BACs containing the selectioncassettes −2/+2 and −1 in the end of the recoded sequence, the −3/+3cassette was provided as a PCR-product flanked by 50 bp-homology regionstargeting the desired locus and integrated by lambda-red-mediatedrecombination followed by selection for +3. Due to the homology betweenthe recoded DNA and the genome, some of the resulting clones wouldcontain −3/+3 on the BAC and some on the genome. To identify clones withthe cassette on the BAC, clones were plated in replica on agar platesselecting (1) for +3, (2) against −3, and (3) for +2 and against −3;Only clones surviving on plate (1) and (2) but not on (3) have the −3/+3cassette integrated on the BAC. The location of the cassette wasverified by purifying the BAC using QIAprep Spin Miniprep Kit followedby genotyping. In a second step, the −3/+3 cassette was replaced byproviding a PCR-product of the desired sequence flanked by 50bp-homology regions and integrated by lambda-red-mediated recombinationfollowed by selection for +2 and against −3. The BAC was genotyped asabove and sequence-verified by NGS.

Preparing a Non-Transferable F′ Plasmid and Conjugative Transfer ofEpisomes

We created the version of the F′ plasmid used for conjugation of genomicDNA, as well as transfer of BACs between strains, to enable transfer ofsequences bearing oriT without transfer of the F′ plasmid itself (FIG.22C). We achieved this by deleting the nick-site in the origin oftransfer (oriT) within the F′ plasmid itself, a related approach waspreviously reported (Strand, T. A., et al., 2014. PLoS One 9, e90372).The F′ plasmid derivative, pRK24 (addgene #51950), was modified byintegrating desired markers as PCR-products flanked by 50 bp-homologyregions and integration was performed by lambda-red-mediatedrecombination using a variant of pKW20 carrying Kan^(R) instead ofTet^(R). First, the β-lactamase gene, conferring ampicillin resistancein pRK24, was replaced with the artificial T5-luxABCDE operon (Bryksin,A. V. & Matsumura, I., 2010. PLoS One 5, e13244), which generatesbioluminescence that allows visual identification of infected bacterialcells. Next, Tet^(R) was replaced with T3-aac3 that producesaminoglycoside 3-N-acetyltransferase IV for selection with 50 μg/mLapramycin. Finally, a 24 bp deletion of the nick-site in oriT was madeby integrating EM7-bsd that expresses blasticidin-S deaminase, and canbe selected for with 50 μg/mL blasticidin in low-salt TYE/LB. Theresulting F′-plasmid called pJF146 (FIG. 22C), was extracted usingQIAprep Spin Miniprep Kit (QIAgen) and transformed by electroporationinto donor strains for subsequent conjugation.

Transfer of episomal DNA containing oriT was performed by conjugation(Isaacs, F. J. et al., 2011. Science 333, 348-353; and Ma, N. J., et al.2014. Nat Protoc 9, 2285-2300). A donor strain was double transformedwith pJF146 and an assembled BAC with oriT (see above). A recipientstrain was transformed with pKW20. 5 ml of donor and recipient culturewere grown to saturation overnight in selective LB media andsubsequently washed 3 times with LB media without antibiotics. Theresuspended donor and recipient strains were combined in a 4:1 ratio,spotted on TYE agar plates and incubated for 1 h at 37° C. The cellswere washed off the plate and spread in serial dilutions on LB agarplates with 2% glucose, 5 μg/ml tetracycline selecting for the recipientstrain and antibiotic selecting for the BAC. Successful transfer of theBAC was confirmed by colony PCR of the BAC-vector insert junctions.

Assembling a Synthetic Genome from Recoded Sections

Transfer of genomic DNA was combined with subsequent recBCD-mediatedrecombination to assemble partially synthetic E. coli genomes into asynthetic genome. In preparation of the donor and recipient strains arpsL-HygR-oriT or Gm^(R)-oriT cassette was supplied as PCR product andintegrated into the donor strain genome via lambda-red-mediatedrecombination (FIGS. 22A and 22B). Separately, a pheS*-Hyg^(R) cassettewas integrated approximately 3 kb downstream of the synthetic DNA in thedonor strains. This provided a template genomic DNA for PCRamplification of a 3 kb synthetic DNA segment with 3′ pheS*-Hyg^(R)selection cassette. This PCR product was provided to the recipientstrains to replace the WT DNA in a lambda-red-mediated recombination.Thereby, the selection marker at the 3′ end of the synthetic segment wasreplaced and a 3 kb homology region to the donor synthetic DNA wasgenerated. This strategy served to systematically generate recipientstrains with 3 kb of homology with their respective donors, always witha pheS-Hyg^(R) at the 3′ end. Additionally, the donor strains weretransformed with pJF146 and sensitivity to tetracycline was confirmed.In contrast, pKW20 was maintained in the donor strains to confertetracycline resistance.

For conjugation, donor and recipient strain were grown to saturationovernight in LB medium with 2% glucose, 5 μg/ml tetracycline and 50μg/ml kanamycin or 20 μg/ml chloramphenicol (donor) and 50 μg/mlapramycin and 200 μg/mL hygromycin B (recipient). The overnight cultureswere diluted 1:10 in the same selective LB medium and grown toOD₆₀₀=0.5. 50 ml of both donor and recipient culture were washed 3 timeswith LB medium with 2% glucose and then each resuspended in 400 μl LBmedium with 2% glucose. 320 μl of donor was mixed with 80 μl ofrecipient, spotted on TYE agar plates and incubated at 37° C. Theincubation time depended on the length of transferred synthetic DNA anddoubling time of the recipient strain and varied from 1 h to 3 h. Cellswere washed off the plate and transferred into 100 ml LB medium with 2%glucose and 5 μg/ml tetracycline and incubated at 37° C. for 2 h withshaking. Subsequently 50 μg/ml kanamycin or 20 μg/ml chloramphenicol(selecting for the transferred positive selection marker of the donor)was added, followed by another 2 h incubation at 37° C. The culture wasspun down and resuspended in 4 ml Milli-Q filtered water and spread inserial dilutions on selection plates of LB agar with 2% glucose, 5 μg/mltetracycline, 2.5 mM 4-chloro-phenylalanine and 50 μg/ml kanamycin or 20μg/ml chloramphenicol. Successful DNA transfer and recombination wasdetermined by colony PCR for the loss of the pheS*-Hyg^(R) cassette,integration of the donor's selection cassette and absence of the Gm-oriTcassette.

Preparation of Whole-Genome and BAC Libraries for Next-GenerationSequencing

E. coli genomic DNA was purified using the DNEasy Blood and Tissue Kit(QIAgen) as per manufacturer's instructions. BACs were extracted fromcells with the QIAprep Spin Miniprep Kit (QIAgen) as per manufacturer'sinstructions. We found that this kit was suitable for purification ofBACs in excess of 130 kb. We avoided vigorous shaking of the samplesthroughout purification so as to reduce DNA shearing.

Paired-end Illumina sequencing libraries were prepared using theIllumina Nextera XT Kit as per manufacturer's instructions. Sequencingdata was obtained in the Illumina MiSeq, running 2×300 or 2×75 cycleswith the MiSeq Reagent kit v3.

Sequencing Data Analysis

The standard workflow for sequence analysis in this work is compiled inthe iSeq package. In short, sequencing reads were aligned to a referencerecoded or wild-type genome using bowtie2 with soft-clipping activated(Langmead, B. & Salzberg, S. L., 2012. Nat Methods 9, 357-359). Alignedreads were sorted and indexed with samtools (Li, H. et al., 2009.Bioinformatics 25, 2078-2079). A customised Python script combinesfunctionalities of samtools and igvtools to yield a variant callingsummary. This script was used to assess mutations, indels and structuralvariations, in combination with visual analysis in the IntegrativeGenomics Viewer (Thorvaldsdottir, H., et al., 2013. Brief Bioinform 14,178-192).

We produced a custom Python script to generate recoding landscapesacross a target genomic region. Briefly, the script takes a BAMalignment file, a reference in fasta and a GeneBank annotation file asinputs. It identifies the target codons for recoding, and compiles thereads that align to these target codons in the alignment file. It thenoutputs the frequency of recoding at each target codon, and plots thesefrequencies across the length of the genomic region of interest.

Growth Rate Measurement and Analysis

Bacterial clones were grown overnight at 37° C. in LB with 2% glucoseand 100 μg/mL streptomycin. Overnight cultures were diluted 1:50 andmonitored for growth while varying temperature (25° C., 37° C., or 42°C.) and media conditions (LB, LB with 2% glucose, M9 minimal media,2×TY). Measurements of OD₆₀₀ were taken every 5 min for 18 h on a Biomekautomated workstation platform with high speed linear shaking.

To determine doubling times, the growth curves were log 2-transformed.At a linear phase of the curve during exponential growth, the firstderivative was determined (d(log 2(x))/dt) and ten consecutivetime-points with the maximal log 2-derivatives were used to calculatethe doubling time for each replicate. A total of 10 independently grownbiological replicates were measured for the recoded Syn61 strain and wtMDS42^(rpsLK43R). The mean doubling time and standard deviation from themean were calculated for all n=10 replicates.

Microscopy and Cell Size Measurement

Cells were grown with shaking in LB supplemented with 100 μg/mLstreptomycin to approximately OD₆₀₀=0.2. A thin layer of bacteria wassandwiched between an agarose pad and a coverslip. A standard microscopeslide was prepared with a 1% agarose pad (Sigma-Aldrich A4018-5G). Asample of 2 μl to 4 μl of bacterial culture was dropped onto the top ofthe pad. This was covered by a #1 coverslip supported on either side bya glass spacer matched to the −1 mm height of the pad. Samples wereimaged on an upright Zeiss Axiophot phase contrast microscope using a63×1.25NA Plan Neofluar phase objective (Zeiss UK, Cambridge, UK).Images were taken using an IDS ueye monochrome camera under control ofueye cockpit software (IDS Imaging Development Systems GmbH, Obersulm,Germany). 10 fields were taken of each sample. Images were loaded inNikon NIS Elements software for further quantitation (Nikon InstrumentsSurrey UK). The General analysis tool was used to apply an intensitythreshold to segment the bacteria. A one micron lower size limit wasimposed to remove background particulates and dust. Length measurementswere subsequently made on the segmented bacteria using the GeneralAnalysis quantification tools.

Mass Spectrometry

Three biological replicates were performed for each strain. Proteinsfrom each Escherichia coli lysates were solubilized in a buffercontaining 6 M urea in 50 mM ammonium bicarbonate, reduced with 10 mMDTT, and alkylated with 55 mM iodoacetamide. After alkylation, proteinswere diluted to 1 M urea with 50 mM ammonium bicarbonate, digested withLys-C (Promega, UK) at a protein to enzyme ratio of 1:50 for 2 hours at37° C., followed by digestion with Trypsin (Promega, UK) at a protein toenzyme ratio of 1:100 for 12 hours 37° C. The resulting peptide mixtureswere acidified by the addition formic acid to a final concentration of2% v/v. The digests were analysed in duplicate (1 ug initialprotein/injection) by nano-scale capillary LC-MS/MS using a UltimateU3000 HPLC (ThermoScientific Dionex, San Jose, USA) to deliver a flow ofapproximately 300 nL/min. A C18 Acclaim PepMap100 5 μm, 100 μm×20 mmnanoViper (ThermoScientific Dionex, San Jose, USA), trapped the peptidesprior to separation on a C18 Acclaim PepMap100 3 μm, 75 μm×250 mmnanoViper (ThermoScientific Dionex, San Jose, USA). Peptides were elutedwith a 100 minute gradient of acetonitrile (2% to 60%). The analyticalcolumn outlet was directly interfaced via a nano-flow electrosprayionisation source, with a hybrid dual pressure linear ion trap massspectrometer (Orbitrap Velos, ThermoScientific, San Jose, USA). Datadependent analysis was carried out, using a resolution of 30,000 for thefull MS spectrum, followed by ten MS/MS spectra in the linear ion trap.MS spectra were collected over a m/z range of 300-2000. MS/MS scans werecollected using a threshold energy of 35 for collision induceddissociation. All raw files were processed with MaxQuant 1.5.5.1 usingstandard settings and searched against an Escherichia coli strain K-12with the Andromeda search engine integrated into the MaxQuant softwaresuite. Enzyme search specificity was Trypsin/P for both endoproteinases.Up to two missed cleavages for each peptide were allowed.Carbamidomethylation of cysteines was set as fixed modification withoxidized methionine and protein N-acetylation considered as variablemodifications. The search was performed with an initial mass toleranceof 6 ppm for the precursor ion and 0.5 Da for CID MS/MS spectra. Thefalse discovery rate was fixed at 1% at the peptide and protein level.Statistical analysis was carried out using the Perseus (1.5.5.3) moduleof MaxQuant. Prior to statistical analysis, peptides mapped to knowncontaminants, reverse hits and protein groups only identified by sitewere removed. Only protein groups identified with at least two peptides,one of which was unique and two quantitation events were considered fordata analysis. For proteins quantified at least once in each strain, theaverage abundance of each protein across replicates of Syn61 was dividedby the abundance in MDS42 replicates, and then log 2-transformed. AP-value for the difference in abundance between strains was calculatedby two-sample T-test (Perseus).

Toxicity of CYPK incorporation using orthogonal aminoacyl-tRNAsynthetases tRNA_(XXX)s (Elliott, T. S. et al., 2014. Nat Biotechnol 32,465-472; Elliott, T. S., et al., 2016. Cell Chem Biol 23, 805-815; andKrogager, T. P. et al., 2018. Nat Biotechnol 36, 156-159)

Electrocompetent MDS42 and Syn61 cells were transformed with plasmidpKW1_MmPylS_PylT_(XXX) for expression of PylRS and tRNA^(Pyl) _(XXX),where XXX is the indicated anticodon. Three variants of this plasmidwere used, with the anticodon of tRNA^(Pyl) mutated to CGA(pKW1_MmPylS_PylT_(CGA)), UGA (pKW1_MmPylS_PylT_(UGA)) or GCU(pKW1_MmPylS_PylT_(GCU)). Cells were grown over night in LB medium with75 μg/ml spectinomycin. Overnight cultures were diluted 1:100 into LBsupplemented with Nε-(((2-methylcycloprop-2-en-1-yl) methoxy)carbonyl)-L-lysine (CYPK) at 0 mM, 0.5 mM, 1 mM, 2.5 mM and 5 mM andgrowth was measured as described above. “% Max Growth” was determined asthe final OD₆₀₀ in the presence of the indicated concentration of CYPKdivided by the final OD₆₀₀ in the absence of CYPK. Final OD₆₀₀s weredetermined after 600 min.

Deletion of prfA, serU and serT by Homologous Recombination

Recoded versions of the pheS-Hyg^(R) and rpsL-Kan^(R) cassettes,according to the recoding scheme described in FIG. 1A, were synthesisedde novo, so that expression of the selection proteins would not rely ondecoding by serU or serT. For deleting prfA, the recoded rpsL-Kan^(R)was amplified with oligos containing −50 bp homology to the prfAflanking genomic sequences. The same was done for serU and serT withrecoded selection cassette pheS*-Hyg^(R). Oligonucleotide sequences areprovided in FIG. 23 . Syn61 cells harbouring the plasmidpKW20_CDFtet_pAraRedCas9_tracrRNA were made competent as describedabove, using 2×TY instead of LB. Cells were electroporated with ˜8 μg ofPCR product, and recovered for 1 hour in 4 mL SOB, then transferred to100 mL 2×TY supplemented with 5 μg/ml tetracycline. After 4 hours cellswere spun down, resuspended in 500 μL H₂O and plated in serial dilutionsin 2×TY agar plates supplemented with 5 μg/ml tetracycline and 200 μg/mlhygromycin B (for pheS*-Hyg^(R)) or 50 μg/ml kanamycin (forrpsL-Kan^(R)). Deletions were verified in each case by colony PCR withprimers flanking the locus of interest.

All publications mentioned in the above specification are hereinincorporated by reference. Various modifications and variations of thedisclosed methods, cells, compositions and uses of the invention will beapparent to the skilled person without departing from the scope andspirit of the invention. Although the invention has been disclosed inconnection with specific preferred embodiments, it should be understoodthat the invention as claimed should not be unduly limited to suchspecific embodiments. Indeed, various modifications of the disclosedmodes for carrying out the invention, which are obvious to the skilledperson are intended to be within the scope of the following claims.

1. A synthetic prokaryotic genome comprising 5 or fewer occurrences ofone or more sense codons.
 2. The synthetic prokaryotic genome accordingto claim 1, wherein the synthetic prokaryotic genome comprises 100 ormore genes.
 3. The synthetic prokaryotic genome according to claim 1,wherein: (i) the synthetic prokaryotic genome is a synthetic bacterialgenome; (ii) the one or more sense codons consist of one sense codon ortwo sense codons; (iii) the synthetic prokaryotic genome comprises nooccurrences of two or more sense codons; (iv) the one or more sensecodons are selected from TCG, TCA, TCT, TCC, AGT, AGC, GCG, GCA, GCT,GCC, CTG, CTA, CTT, CTC, TTG, and TTA; and/or (v) the syntheticprokaryotic genome comprises 10 or fewer occurrences, or no occurrences,of the amber stop codon (TAG).
 4. The synthetic prokaryotic genomeaccording to claim 1, wherein the synthetic prokaryotic genome isviable.
 5. A synthetic prokaryotic genome derived from a parentprokaryotic genome, wherein the synthetic prokaryotic genome comprisesless than 10% of the occurrences of one or more sense codons, relativeto the parent prokaryotic genome, or wherein the synthetic prokaryoticgenome comprises no occurrences of one or more sense codons.
 6. Thesynthetic prokaryotic genome according to claim 5, wherein: (i) thesynthetic prokaryotic genome is a synthetic bacterial genome; (ii) theone or more sense codons are selected from TCG, TCA, TCT, TCC, AGT, AGC,GCG, GCA, GCT, GCC, CTG, CTA, CTT, CTC, TTG, and TTA; (iii) 90% or moreof the occurrences of the one or more sense codons in the parentprokaryotic genome are replaced with synonymous sense codons; (iv) thesynthetic prokaryotic genome comprises 10 or fewer occurrences, or nooccurrences, of the amber stop codon (TAG); (v) 99.9% or more of theoccurrences of two or more sense codons in the parent prokaryotic genomeare replaced with synonymous sense codons, and wherein all of theoccurrences of TAG in the parent prokaryotic genome are replaced withTAA; (vi) one or more pairs of genes which share an overlapping regioncomprising the one or more sense codons in the parent prokaryotic genomeare refactored; and/or (vii) the synthetic prokaryotic genome is 100 kbto 10 Mb in size.
 7. The synthetic prokaryotic genome according to claim6, wherein for pairs of genes in opposite orientations, a syntheticinsert is inserted between the genes, wherein the synthetic insertcomprises the overlapping region; and/or wherein for pairs of genes inthe same orientation, a synthetic insert is inserted between the genes,wherein the synthetic insert comprises: (i) a stop codon; (ii) about20-200 bp from upstream of the overlapping region; and (iii) theoverlapping region.
 8. A polynucleotide comprising twenty or moreessential genes with no occurrences of one or more sense codons.
 9. Thepolynucleotide according to claim 8, wherein: (i) the one or more sensecodons consist of one sense codon or two sense codons; (ii) the one ormore sense codons are selected from TCG, TCA, TCT, TCC, AGT, AGC, GCG,GCA, GCT, GCC, CTG, CTA, CTT, CTC, TTG, and TTA; (iii) the occurrencesof the one or more sense codons in the genes are replaced withsynonymous sense codons; and/or (iv) the essential genes compriseessential genes selected from one or more of the list consisting of:ribF, lspA, ispH, dapB, folA, imp, yabO, ftsL, ftsI, murE, murF, mraY,murD, ftsW, murG, murC, ftsQ, ftsA, ftsZ, lpxC, secM, secA, can, folK,hemL, yadR, dapD, map, rpsB, tsf, pyrH, frr, dxr, ispU, cdsA, yaeL,yaeT, lpxD, fabZ, lpxA, lpxB, dnaE, accA, tilS, proS, yafF, hemB, secD,secF, ribD, ribE, thiL, dxs, ispA, dnaX, adk, hemH, lpxH, cysS, folD,entD, mrdB, mrdA, nadD, holA, rlpB, leuS, lnt, ginS, fldA, cydA, infA,cydC, ftsK, lolA, serS, rpsA, msbA, lpxK, kdsB, mukF, mukE, mukB, asnS,fabA, mviN, me, fabD, fabG, acpP, tmk, holB, IC, lolD, lolE, purB, minE,minD, pth, prsA, ispE, lolB, hemA, prfA, prmC, kdsA, topA, ribA, fabI,tyrS, ribC, ydiL, pheT, pheS, rplT, infC, thrS, nadE, gapA, yeaZ, aspS,argS, pgsA, yefM, metG, folE, yejM, gyrA, nrdA, nrdB, folC, accD, fabB,gltX, ligA, zipA, dapE, dapA, der, hisS, ispG, suhB, tadA, acpS, era,rnc, bepB, rpoE, pssA, yfiO, rplS, trmD, rpsP, ffh, grpE, csrA, ispF,ispD, ftsB, eno, pyrG, chpR, lgt, fbaA, pgk, yqgD, metK, yqgF, plsC,ygiT, parE, ribB, cca, ygjD, tdcF, yraL, yhbV, infB, nusA, ftsH, obgE,rpmA, rplU, ispB, murA, yrbB, yrbK, yhbN, rpsI, rplM, degS, mreD, mreC,mreB, accB, accC, yrdC, def, fmt, rplQ, rpoA, rpsD, rpsK, rpsM, secY,rplO, rpmD, rpsE, rplR, rplF, rpsH, rpsN, rplE, rplX, rplN, rpsQ, rpmC,rplP, rpsC, rplV, rpsS, rplB, rplW, rplD, rpbC, rpsJ, fusA, rpsG, rpsL,trpS, yrfF, asd, rpoH, ftsX, ftsE, ftsY, yhhQ, bcsB, glyQ, gpsA, rfaK,kdtA, coaD, rpmB, dfp, dut, gmk, spoT, gyrB, dnaN, dnaA, rpmH, rnpA,yidC, tnaB, glmS, glmU, wzyE, hemD, hemC, yigP, ubiB, ubiD, hemG, yihA,ftsN, murI, murB, birA, secE, nusG, rplJ, rplL, rpoB, rpoC, ubiA, plsB,bexA, dnaB, ssb, alsK, groS, psd, orn, yjeE, rpsR, chpS, ppa, valS,yjgP, yjgQ, and dnaC.
 10. A prokaryotic host cell comprising thesynthetic prokaryotic genome according to claim
 1. 11. The prokaryotichost cell according to claim 10, wherein: (i) the prokaryotic host cellis viable; and/or (ii) the prokaryotic host cell is a bacterial cell.12. The prokaryotic host cell according to claim 11, wherein the cell isan Escherichia coli, Salmonella enterica, or Shigella dysenteriae cell.13. A prokaryotic host cell comprising the polynucleotide according toclaim
 8. 14. A method for production of polypeptides comprising one ormore non-proteinogenic amino acids, the method comprising culturing theprokaryotic host cell according to claim 10 under conditions and for atime sufficient for production of polypeptides comprising one or morenon-proteinogenic amino acids.
 15. A method for producing a syntheticgenome comprising: (a) providing a parent genome; (b) carrying out oneor more rounds of recombination-mediated genetic engineering on theparent genome, to produce two or more different partially syntheticgenomes; and (c) carrying out one or more rounds of directed conjugationwith the two or more different partially synthetic genomes to produce asynthetic genome; wherein the partially synthetic genomes each comprisea synthetic region that has 50 or fewer occurrences, or 0 occurrences,of each of one or more sense codons; or wherein the partially syntheticgenomes each comprise a synthetic region that has less than 10% of theoccurrences of each of one or more sense codons, relative to thecorresponding region in the parent genome.
 16. The method according toclaim 15, wherein: (i) the synthetic regions collectively cover 90% orgreater of the parent genome; (ii) the synthetic regions are 10-1000 kbin size; (iii) the viability of the partially synthetic genomes istested after each round of recombination-mediated genetic engineeringand/or after each round of directed conjugation; (iv) the two or moredifferent partially synthetic genomes comprise at least one partiallysynthetic donor genome and at least one partially synthetic recipientgenome; and/or (v) the one or more rounds of recombination-mediatedgenetic engineering comprise one or more rounds of replicon excision forenhanced genome engineering through programmed recombination (REXER).17. The method according to claim 16, wherein the at least one partiallysynthetic donor genome comprises a synthetic region and a firstselectable marker flanked by two homology regions immediately downstreamof an origin of transfer; and wherein the at least one partiallysynthetic recipient genomes comprise a second selectable marker flankedby two corresponding homology regions.
 18. The method according to claim17, wherein the synthetic region present in the at least one partiallysynthetic recipient genomes is outside the region flanked by thehomology regions.
 19. The method according to claim 17, wherein themethod further comprises one or more rounds of selection for theselectable markers.
 20. A synthetic prokaryotic genome produced by themethod according to claim 15.